LinuxLists.cc - [PATCH 03/15] perf: optimize perf_swevent_ctx

2009-11-20 21:31:38

Subject: [PATCH 03/15] perf: optimize perf_swevent_ctx_event()

Remove a rcu_read_{,un}lock() pair and a few conditionals.

We can remove the rcu_read_lock() by increasing the scope of one in
the calling function.

We can do away with the system_state check if the machine still boots
after this patch (seems to be the case).

We can do away with the list_empty() check because the bare
list_for_each_entry_rcu() reduces to that now that we've removed
everything else.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/perf_event.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -3886,15 +3886,10 @@ static void perf_swevent_ctx_event(struc
{
struct perf_event *event;

- if (system_state != SYSTEM_RUNNING || list_empty(&ctx->event_list))
- return;
-
- rcu_read_lock();
list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
if (perf_swevent_match(event, type, event_id, data, regs))
perf_swevent_add(event, nr, nmi, data, regs);
}
- rcu_read_unlock();
}

static int *perf_swevent_recursion_context(struct perf_cpu_context *cpuctx)
@@ -3926,9 +3921,9 @@ static void do_perf_sw_event(enum perf_t
(*recursion)++;
barrier();

+ rcu_read_lock();
perf_swevent_ctx_event(&cpuctx->ctx, type, event_id,
nr, nmi, data, regs);
- rcu_read_lock();
/*
* doesn't really matter which of the child contexts the
* events ends up in.

--

2009-11-21 13:41:59

by Peter Zijlstra

[permalink] [raw]

Subject: [tip:perf/core] perf: Optimize perf_swevent_ctx_event()

Commit-ID: 81520183878a8813c71c9372de28bb70913ba549
Gitweb: http://git.kernel.org/tip/81520183878a8813c71c9372de28bb70913ba549
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 20 Nov 2009 22:19:45 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 21 Nov 2009 14:11:35 +0100

perf: Optimize perf_swevent_ctx_event()

Remove a rcu_read_{,un}lock() pair and a few conditionals.

We can remove the rcu_read_lock() by increasing the scope of one
in the calling function.

We can do away with the system_state check if the machine still
boots after this patch (seems to be the case).

We can do away with the list_empty() check because the bare
list_for_each_entry_rcu() reduces to that now that we've removed
everything else.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/perf_event.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8e55b44..cda17ac 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3886,15 +3886,10 @@ static void perf_swevent_ctx_event(struct perf_event_context *ctx,
{
struct perf_event *event;

- if (system_state != SYSTEM_RUNNING || list_empty(&ctx->event_list))
- return;
-
- rcu_read_lock();
list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
if (perf_swevent_match(event, type, event_id, data, regs))
perf_swevent_add(event, nr, nmi, data, regs);
}
- rcu_read_unlock();
}

static int *perf_swevent_recursion_context(struct perf_cpu_context *cpuctx)
@@ -3926,9 +3921,9 @@ static void do_perf_sw_event(enum perf_type_id type, u32 event_id,
(*recursion)++;
barrier();

+ rcu_read_lock();
perf_swevent_ctx_event(&cpuctx->ctx, type, event_id,
nr, nmi, data, regs);
- rcu_read_lock();
/*
* doesn't really matter which of the child contexts the
* events ends up in.

2009-11-23 05:50:20

by Paul Mackerras

[permalink] [raw]

Subject: Re: [PATCH 03/15] perf: optimize perf_swevent_ctx_event()

Peter Zijlstra writes:

> We can do away with the system_state check if the machine still boots
> after this patch (seems to be the case).

I have a recollection (possible faulty) that the problem we can get
into if we don't have this check is that if we take a bad page fault
in the kernel (e.g. NULL dereference) early in boot before the perf
cpu context has been initialized, we then get another NULL dereference
because the pointers in ctx->event_list are NULL, and recurse to
death.

So that check was possibly more about debugging than correctness.
Possibly also the x86 do_page_fault() is different enough from the
powerpc one that the problem can't occur on x86.

Paul.

2009-11-23 07:31:39

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 03/15] perf: optimize perf_swevent_ctx_event()

On Mon, 2009-11-23 at 16:50 +1100, Paul Mackerras wrote:
> Peter Zijlstra writes:
>
> > We can do away with the system_state check if the machine still boots
> > after this patch (seems to be the case).
>
> I have a recollection (possible faulty) that the problem we can get
> into if we don't have this check is that if we take a bad page fault
> in the kernel (e.g. NULL dereference) early in boot before the perf
> cpu context has been initialized, we then get another NULL dereference
> because the pointers in ctx->event_list are NULL, and recurse to
> death.
>
> So that check was possibly more about debugging than correctness.
> Possibly also the x86 do_page_fault() is different enough from the
> powerpc one that the problem can't occur on x86.

Right, I remembered there was _something_ we added them for, but
couldn't for the live of me remember what.

Hmm, maybe we can initialize all the recursion variables to 1, that
should avoid us ever entering into the swcounter code until we reset
them.

2009-11-23 08:38:31

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH 03/15] perf: optimize perf_swevent_ctx_event()

On Mon, 2009-11-23 at 08:31 +0100, Peter Zijlstra wrote:
> On Mon, 2009-11-23 at 16:50 +1100, Paul Mackerras wrote:
> > Peter Zijlstra writes:
> >
> > > We can do away with the system_state check if the machine still boots
> > > after this patch (seems to be the case).
> >
> > I have a recollection (possible faulty) that the problem we can get
> > into if we don't have this check is that if we take a bad page fault
> > in the kernel (e.g. NULL dereference) early in boot before the perf
> > cpu context has been initialized, we then get another NULL dereference
> > because the pointers in ctx->event_list are NULL, and recurse to
> > death.
> >
> > So that check was possibly more about debugging than correctness.
> > Possibly also the x86 do_page_fault() is different enough from the
> > powerpc one that the problem can't occur on x86.
>
> Right, I remembered there was _something_ we added them for, but
> couldn't for the live of me remember what.
>
> Hmm, maybe we can initialize all the recursion variables to 1, that
> should avoid us ever entering into the swcounter code until we reset
> them.

I think the below patch fixed that..

---

commit f29ac756a40d0f1bb07d682ea521e7b666ff06d5
Author: Peter Zijlstra <[email protected]>
Date: Fri Jun 19 18:27:26 2009 +0200

perf_counter: Optimize perf_swcounter_event()

Similar to tracepoints, use an enable variable to reduce
overhead when unused.

Only look for a counter of a particular event type when we know
there is at least one in the system.

Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <new-submission>
Cc: Mike Galbraith <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 89698d8..e7213e4 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -669,7 +669,16 @@ static inline int is_software_counter(struct perf_counter *counter)
(counter->attr.type != PERF_TYPE_HW_CACHE);
}

-extern void perf_swcounter_event(u32, u64, int, struct pt_regs *, u64);
+extern atomic_t perf_swcounter_enabled[PERF_COUNT_SW_MAX];
+
+extern void __perf_swcounter_event(u32, u64, int, struct pt_regs *, u64);
+
+static inline void
+perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
+{
+ if (atomic_read(&perf_swcounter_enabled[event]))
+ __perf_swcounter_event(event, nr, nmi, regs, addr);
+}

extern void __perf_counter_mmap(struct vm_area_struct *vma);

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 1a933a2..7515c76 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -3317,8 +3317,8 @@ out:
put_cpu_var(perf_cpu_context);
}

-void
-perf_swcounter_event(u32 event, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
+void __perf_swcounter_event(u32 event, u64 nr, int nmi,
+ struct pt_regs *regs, u64 addr)
{
struct perf_sample_data data = {
.regs = regs,
@@ -3509,9 +3509,19 @@ static const struct pmu *tp_perf_counter_init(struct perf_counter *counter)
}
#endif

+atomic_t perf_swcounter_enabled[PERF_COUNT_SW_MAX];
+
+static void sw_perf_counter_destroy(struct perf_counter *counter)
+{
+ u64 event = counter->attr.config;
+
+ atomic_dec(&perf_swcounter_enabled[event]);
+}
+
static const struct pmu *sw_perf_counter_init(struct perf_counter *counter)
{
const struct pmu *pmu = NULL;
+ u64 event = counter->attr.config;

/*
* Software counters (currently) can't in general distinguish
@@ -3520,7 +3530,7 @@ static const struct pmu *sw_perf_counter_init(struct perf_counter *counter)
* to be kernel events, and page faults are never hypervisor
* events.
*/
- switch (counter->attr.config) {
+ switch (event) {
case PERF_COUNT_SW_CPU_CLOCK:
pmu = &perf_ops_cpu_clock;

@@ -3541,6 +3551,8 @@ static const struct pmu *sw_perf_counter_init(struct perf_counter *counter)
case PERF_COUNT_SW_PAGE_FAULTS_MAJ:
case PERF_COUNT_SW_CONTEXT_SWITCHES:
case PERF_COUNT_SW_CPU_MIGRATIONS:
+ atomic_inc(&perf_swcounter_enabled[event]);
+ counter->destroy = sw_perf_counter_destroy;
pmu = &perf_ops_generic;
break;
}