2013-06-03 09:41:03

by Alexander Gordeev

[permalink] [raw]
Subject: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events

This patchset is against perf/core branch.

As Linux is able to measure task-bound and CPU-bound performance
events there are no convenient means to monitor performance of
an execution context which requires control and tuning probably
most - interrupt service routines.

This series is an attempt to introduce IRQ-bound performance
events - ones that only count in a context of a hardware interrupt
handler.

The implementation is pretty straightforward: an IRQ-bound event
is registered with the IRQ descriptor and gets enabled/disabled
using new PMU callbacks: pmu_enable_irq() and pmu_disable_irq().

The series has not been tested thoroughly and is a concept proof
rather than a decent implementation: no group events could be be
loaded, inappropriate (i.e. software) events are not rejected,
only Intel and AMD PMUs were tried for 'perf stat', only Intel
PMU works with precise events. Perf tool changes are just a hack.

Yet, I would like first to ensure if the approach taken is not
screwed and I did not miss anything vital. Not to mention if the
change is wanted at all.

Below is a sample session on a machine with x2apic in cluster mode.
IRQ number is passed using new argument -I <irq> (please nevermind
'...process id '8'...' in the output for now):

# cat /proc/interrupts | grep ' 8:'
8: 23 0 0 0 21 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 0 0 0 23 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-IO-APIC-edge rtc0
# ./tools/perf/perf stat -a -e L1-dcache-load-misses:k sleep 1
Performance counter stats for 'sleep 1':

124,849 L1-dcache-load-misses

1.001359403 seconds time elapsed

# ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k sleep 1

Performance counter stats for process id '8':

0 L1-dcache-load-misses

1.001235781 seconds time elapsed

# ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k hwclock --test
Mon 03 Jun 2013 04:42:59 AM EDT -0.891274 seconds

Performance counter stats for process id '8':


Alexander Gordeev (6):
perf/core: IRQ-bound performance events
perf/x86: IRQ-bound performance events
perf/x86/AMD PMU: IRQ-bound performance events
perf/x86/Core PMU: IRQ-bound performance events
perf/x86/Intel PMU: IRQ-bound performance events
perf/tool: Hack 'pid' as 'irq' for sys_perf_event_open()

arch/x86/kernel/cpu/perf_event.c | 71 ++++++++++++++++++---
arch/x86/kernel/cpu/perf_event.h | 19 ++++++
arch/x86/kernel/cpu/perf_event_amd.c | 2 +
arch/x86/kernel/cpu/perf_event_intel.c | 93 +++++++++++++++++++++++++--
arch/x86/kernel/cpu/perf_event_intel_ds.c | 5 +-
arch/x86/kernel/cpu/perf_event_knc.c | 2 +
arch/x86/kernel/cpu/perf_event_p4.c | 2 +
arch/x86/kernel/cpu/perf_event_p6.c | 2 +
include/linux/irq.h | 8 ++
include/linux/irqdesc.h | 3 +
include/linux/perf_event.h | 16 +++++
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 69 +++++++++++++++----
kernel/irq/Makefile | 1 +
kernel/irq/handle.c | 4 +
kernel/irq/irqdesc.c | 14 ++++
kernel/irq/perf_event.c | 100 +++++++++++++++++++++++++++++
tools/perf/builtin-record.c | 8 ++
tools/perf/builtin-stat.c | 8 ++
tools/perf/util/evlist.c | 4 +-
tools/perf/util/evsel.c | 3 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/target.c | 4 +
tools/perf/util/thread_map.c | 16 +++++
24 files changed, 422 insertions(+), 34 deletions(-)
create mode 100644 kernel/irq/perf_event.c

--
1.7.7.6


--
Regards,
Alexander Gordeev
[email protected]


2013-06-03 12:22:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events


* Alexander Gordeev <[email protected]> wrote:

> This patchset is against perf/core branch.
>
> As Linux is able to measure task-bound and CPU-bound performance
> events there are no convenient means to monitor performance of
> an execution context which requires control and tuning probably
> most - interrupt service routines.
>
> This series is an attempt to introduce IRQ-bound performance
> events - ones that only count in a context of a hardware interrupt
> handler.
>
> The implementation is pretty straightforward: an IRQ-bound event
> is registered with the IRQ descriptor and gets enabled/disabled
> using new PMU callbacks: pmu_enable_irq() and pmu_disable_irq().
>
> The series has not been tested thoroughly and is a concept proof
> rather than a decent implementation: no group events could be be
> loaded, inappropriate (i.e. software) events are not rejected,
> only Intel and AMD PMUs were tried for 'perf stat', only Intel
> PMU works with precise events. Perf tool changes are just a hack.
>
> Yet, I would like first to ensure if the approach taken is not
> screwed and I did not miss anything vital. Not to mention if the
> change is wanted at all.
>
> Below is a sample session on a machine with x2apic in cluster mode.
> IRQ number is passed using new argument -I <irq> (please nevermind
> '...process id '8'...' in the output for now):

Looks useful.

I think the main challenges are:

- Creating a proper ABI for all this:

- IRQ numbers alone are probably not specific enough: we'd also want to
be more specific to match on handler names - or handler numbers if
the handler name is not unique.

- another useful variant would be where IRQ numbers are too specific:
something like 'perf top irq' would be a natural thing to do, to see
only overhead in hardirq execution - without limiting it to a
specific handler. An 'all irq contexts' wildcard concept?

- Covering softirqs as well. If we handle both hardirqs and softirqs,
then we are pretty much feature complete: all major context types that
the Linux kernel cares about are covered in instrumentation. For things
like networking the softirq overhead is obviously very important, and
for example on routers it will do most of the execution.

- Covering threaded IRQs as well, in a similar model. So if someone types
'perf top irq', and some IRQ handlers are running threaded, those
should probaby be included as well.

- Making the tooling friendlier: 'perf top irq' would be useful, and
accepting handler names would be useful as well.

The runtime overhead of your patches seems to be pretty low: when no IRQ
contexts are instrumented then it's a single 'is the list empty' check at
context scheduling time. That looks acceptable.

Regarding the ABI and IRQ/softirq context enumeration you are breaking
lots of new ground here, because unlike tasks, cgroups and CPUs the IRQ
execution contexts do not have a good programmatically accessible
namespace (yet). So it has to be thought out pretty well I think, but once
we have it, it will be a lovely feature IMO.

Thanks,

Ingo

2013-06-03 13:35:28

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events

On Mon, Jun 03, 2013 at 11:41:32AM +0200, Alexander Gordeev wrote:
> # ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k sleep 1
>
> Performance counter stats for process id '8':
>
> 0 L1-dcache-load-misses
>
> 1.001235781 seconds time elapsed
>
> # ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k hwclock --test
> Mon 03 Jun 2013 04:42:59 AM EDT -0.891274 seconds
>
> Performance counter stats for process id '8':

Oops, the most interesting part did not make it in. Very sorry :) Here:

# ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k hwclock --test
Mon 03 Jun 2013 09:32:49 AM EDT -0.719514 seconds

Performance counter stats for process id '8':

447 L1-dcache-load-misses

0.720874208 seconds time elapsed


--
Regards,
Alexander Gordeev
[email protected]

2013-06-03 19:41:28

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events

On Mon, Jun 03, 2013 at 02:22:23PM +0200, Ingo Molnar wrote:
>
> * Alexander Gordeev <[email protected]> wrote:
>
> > This patchset is against perf/core branch.
> >
> > As Linux is able to measure task-bound and CPU-bound performance
> > events there are no convenient means to monitor performance of
> > an execution context which requires control and tuning probably
> > most - interrupt service routines.
> >
> > This series is an attempt to introduce IRQ-bound performance
> > events - ones that only count in a context of a hardware interrupt
> > handler.
> >
> > The implementation is pretty straightforward: an IRQ-bound event
> > is registered with the IRQ descriptor and gets enabled/disabled
> > using new PMU callbacks: pmu_enable_irq() and pmu_disable_irq().
> >
> > The series has not been tested thoroughly and is a concept proof
> > rather than a decent implementation: no group events could be be
> > loaded, inappropriate (i.e. software) events are not rejected,
> > only Intel and AMD PMUs were tried for 'perf stat', only Intel
> > PMU works with precise events. Perf tool changes are just a hack.
> >
> > Yet, I would like first to ensure if the approach taken is not
> > screwed and I did not miss anything vital. Not to mention if the
> > change is wanted at all.
> >
> > Below is a sample session on a machine with x2apic in cluster mode.
> > IRQ number is passed using new argument -I <irq> (please nevermind
> > '...process id '8'...' in the output for now):
>
> Looks useful.
>
> I think the main challenges are:
>
> - Creating a proper ABI for all this:
>
> - IRQ numbers alone are probably not specific enough: we'd also want to
> be more specific to match on handler names - or handler numbers if
> the handler name is not unique.
>
> - another useful variant would be where IRQ numbers are too specific:
> something like 'perf top irq' would be a natural thing to do, to see
> only overhead in hardirq execution - without limiting it to a
> specific handler. An 'all irq contexts' wildcard concept?
>
> - Covering softirqs as well. If we handle both hardirqs and softirqs,
> then we are pretty much feature complete: all major context types that
> the Linux kernel cares about are covered in instrumentation. For things
> like networking the softirq overhead is obviously very important, and
> for example on routers it will do most of the execution.
>
> - Covering threaded IRQs as well, in a similar model. So if someone types
> 'perf top irq', and some IRQ handlers are running threaded, those
> should probaby be included as well.
>
> - Making the tooling friendlier: 'perf top irq' would be useful, and
> accepting handler names would be useful as well.

How about we define finegrained context on top of perf events themselves?
Like we could tell perf to count a task's instructions only after
tracepoint:irq_entry is hit and stop counting when tracepoint:irq_exit.

This way we can define any kind of fine grained context, not just irqs. We
are not short on tracepoints, software events, breakpoints, kprobes, uprobes
to play Legos there.

I had a branch with a working draft of that:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
perf/custom-ctx-v2-pre

Frederic Weisbecker (5):
perf: Starter and stopper events
perf: New enable_on_starter attribute
perf: Support for starter and stopper in tools
perf: New --enable-on-starter option
perf: Add TODOs for event defined context

It needs quite some improvements, (some are listed in the TODO on the last commit)
especially in both the kernel and user interfaces.

Jiri had some nice ideas about it.

Also Peter was unhappy about how starter/stopper events were inherited in children
events because it complicated the inheritance code, which I totally agree with, although
I couldn't think of a better way by that time. Then I got distracted with other things so
this was the last iteration.

But it can be an interesting starting point. I'm convinced this can be a great feature!
Think about all the user contexts we can define with uprobes for example.

Thanks.

2013-06-04 08:52:38

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events

On Mon, Jun 03, 2013 at 09:41:21PM +0200, Frederic Weisbecker wrote:
> On Mon, Jun 03, 2013 at 02:22:23PM +0200, Ingo Molnar wrote:
> >
> > * Alexander Gordeev <[email protected]> wrote:

SNIP

> How about we define finegrained context on top of perf events themselves?
> Like we could tell perf to count a task's instructions only after
> tracepoint:irq_entry is hit and stop counting when tracepoint:irq_exit.
>
> This way we can define any kind of fine grained context, not just irqs. We
> are not short on tracepoints, software events, breakpoints, kprobes, uprobes
> to play Legos there.

agreed, we could do the same as Alex did plus we'd have
the generic interface to meassure any place

>
> I had a branch with a working draft of that:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> perf/custom-ctx-v2-pre
>
> Frederic Weisbecker (5):
> perf: Starter and stopper events
> perf: New enable_on_starter attribute
> perf: Support for starter and stopper in tools
> perf: New --enable-on-starter option
> perf: Add TODOs for event defined context
>
> It needs quite some improvements, (some are listed in the TODO on the last commit)
> especially in both the kernel and user interfaces.
>
> Jiri had some nice ideas about it.

yep, one of them is to to get back to this soon ;-)

jirka

2013-06-04 09:38:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events

On Mon, Jun 03, 2013 at 03:36:19PM +0200, Alexander Gordeev wrote:
> On Mon, Jun 03, 2013 at 11:41:32AM +0200, Alexander Gordeev wrote:
> > # ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k sleep 1
> >
> > Performance counter stats for process id '8':
> >
> > 0 L1-dcache-load-misses
> >
> > 1.001235781 seconds time elapsed
> >
> > # ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k hwclock --test
> > Mon 03 Jun 2013 04:42:59 AM EDT -0.891274 seconds
> >
> > Performance counter stats for process id '8':
>
> Oops, the most interesting part did not make it in. Very sorry :) Here:
>
> # ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k hwclock --test
> Mon 03 Jun 2013 09:32:49 AM EDT -0.719514 seconds
>
> Performance counter stats for process id '8':
>
> 447 L1-dcache-load-misses

I think that is very much expected; except in the case where you spend
_all_ your time in IRQ handlers, they'll pretty much always miss l1
cache.

2013-06-04 10:13:44

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH RFC -tip 0/6] perf: IRQ-bound performance events

On Tue, Jun 04, 2013 at 11:38:02AM +0200, Peter Zijlstra wrote:
> > Oops, the most interesting part did not make it in. Very sorry :) Here:
> >
> > # ./tools/perf/perf stat -I 8 -a -e L1-dcache-load-misses:k hwclock --test
> > Mon 03 Jun 2013 09:32:49 AM EDT -0.719514 seconds
> >
> > Performance counter stats for process id '8':
> >
> > 447 L1-dcache-load-misses
>
> I think that is very much expected; except in the case where you spend
> _all_ your time in IRQ handlers, they'll pretty much always miss l1
> cache.

The emphasis was on the fact it indeed could be measured for a particular ISR ;)

--
Regards,
Alexander Gordeev
[email protected]