2013-08-02 16:30:05

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/3] perf: Fixes on event accounting

Peter, Jiri,

Here is a proposed set of fixes after the discussion we had about the
last changes in the event accounting code.

Thanks,
Frederic
---

Frederic Weisbecker (3):
perf: Rollback callchain buffer refcount under the callchain mutex
perf: Account freq events globally
nohz: Include local CPU in full dynticks global kick


kernel/events/callchain.c | 3 ++-
kernel/events/core.c | 19 ++++++++-----------
kernel/time/tick-sched.c | 1 +
3 files changed, 11 insertions(+), 12 deletions(-)


2013-08-02 16:30:09

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/3] perf: Account freq events globally

Freq events may not always be affine to a particular CPU. As such,
account_event_cpu() may crash if we account per cpu a freq event
that has event->cpu == -1.

To solve this, lets account freq events globally. In practice
this doesn't change much the picture because perf tools create
per-task perf events with one event per CPU by default. Profiling a
single CPU is usually a corner case so there is no much point in
optimizing things that way.

Reported-by: Jiri Olsa <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
---
kernel/events/core.c | 19 ++++++++-----------
1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 916cf1f..617f980 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -141,11 +141,11 @@ enum event_type_t {
struct static_key_deferred perf_sched_events __read_mostly;
static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
-static DEFINE_PER_CPU(atomic_t, perf_freq_events);

static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
+static atomic_t nr_freq_events __read_mostly;

static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
@@ -1871,9 +1871,6 @@ static int __perf_install_in_context(void *info)
perf_pmu_enable(cpuctx->ctx.pmu);
perf_ctx_unlock(cpuctx, task_ctx);

- if (atomic_read(&__get_cpu_var(perf_freq_events)))
- tick_nohz_full_kick();
-
return 0;
}

@@ -2811,7 +2808,7 @@ done:
#ifdef CONFIG_NO_HZ_FULL
bool perf_event_can_stop_tick(void)
{
- if (atomic_read(&__get_cpu_var(perf_freq_events)) ||
+ if (atomic_read(&nr_freq_events) ||
__this_cpu_read(perf_throttled_count))
return false;
else
@@ -3140,9 +3137,6 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
}
if (is_cgroup_event(event))
atomic_dec(&per_cpu(perf_cgroup_events, cpu));
-
- if (event->attr.freq)
- atomic_dec(&per_cpu(perf_freq_events, cpu));
}

static void unaccount_event(struct perf_event *event)
@@ -3158,6 +3152,8 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_comm_events);
if (event->attr.task)
atomic_dec(&nr_task_events);
+ if (event->attr.freq)
+ atomic_dec(&nr_freq_events);
if (is_cgroup_event(event))
static_key_slow_dec_deferred(&perf_sched_events);
if (has_branch_stack(event))
@@ -6479,9 +6475,6 @@ static void account_event_cpu(struct perf_event *event, int cpu)
}
if (is_cgroup_event(event))
atomic_inc(&per_cpu(perf_cgroup_events, cpu));
-
- if (event->attr.freq)
- atomic_inc(&per_cpu(perf_freq_events, cpu));
}

static void account_event(struct perf_event *event)
@@ -6497,6 +6490,10 @@ static void account_event(struct perf_event *event)
atomic_inc(&nr_comm_events);
if (event->attr.task)
atomic_inc(&nr_task_events);
+ if (event->attr.freq) {
+ if (atomic_inc_return(&nr_freq_events) == 1)
+ tick_nohz_full_kick_all();
+ }
if (has_branch_stack(event))
static_key_slow_inc(&perf_sched_events.key);
if (is_cgroup_event(event))
--
1.7.5.4

2013-08-02 16:30:20

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/3] nohz: Include local CPU in full dynticks global kick

tick_nohz_full_kick_all() is useful to notify all full dynticks
CPUs that there is a system state change to checkout before
re-evaluating the need for the tick.

Unfortunately this is implemented using smp_call_function_many()
that ignores the local CPU. This CPU also needs to re-evaluate
the tick.

on_each_cpu_mask() is not useful either because we don't want to
re-evaluate the tick state in place but asynchronously from an IPI
to avoid messing up with any random locking scenario.

Sol lets call tick_nohz_full_kick() from tick_nohz_full_kick_all()
so that the usual irq work takes care of it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
---
kernel/time/tick-sched.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e80183f..30849d4 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -244,6 +244,7 @@ void tick_nohz_full_kick_all(void)
preempt_disable();
smp_call_function_many(nohz_full_mask,
nohz_full_kick_ipi, NULL, false);
+ tick_nohz_full_kick();
preempt_enable();
}

--
1.7.5.4

2013-08-02 16:30:07

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/3] perf: Rollback callchain buffer refcount under the callchain mutex

When we fail to allocate the callchain buffers, we rollback the refcount
we did and return from get_callchain_buffers().

However we take the refcount and allocate under the callchain lock
but the rollback is done outside the lock.

As a result, while we rollback, some concurrent callchain user may
call get_callchain_buffers(), see the non-zero refcount and give up
because the buffers are NULL without itself retrying the allocation.

The consequences aren't that bad but that behaviour looks weird enough and
it's better to give their chances to the following callchain users where
we failed.

Reported-by: Jiri Olsa <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
---
kernel/events/callchain.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 76a8bc5..97b67df 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -116,10 +116,11 @@ int get_callchain_buffers(void)

err = alloc_callchain_buffers();
exit:
- mutex_unlock(&callchain_mutex);
if (err)
atomic_dec(&nr_callchain_events);

+ mutex_unlock(&callchain_mutex);
+
return err;
}

--
1.7.5.4

2013-08-09 10:28:33

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf: Rollback callchain buffer refcount under the callchain mutex

On Fri, Aug 02, 2013 at 06:29:54PM +0200, Frederic Weisbecker wrote:
> When we fail to allocate the callchain buffers, we rollback the refcount
> we did and return from get_callchain_buffers().
>
> However we take the refcount and allocate under the callchain lock
> but the rollback is done outside the lock.
>
> As a result, while we rollback, some concurrent callchain user may
> call get_callchain_buffers(), see the non-zero refcount and give up
> because the buffers are NULL without itself retrying the allocation.
>
> The consequences aren't that bad but that behaviour looks weird enough and
> it's better to give their chances to the following callchain users where
> we failed.
>
> Reported-by: Jiri Olsa <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Jiri Olsa <[email protected]>

Acked-by: Jiri Olsa <[email protected]>

2013-08-09 10:33:51

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 2/3] perf: Account freq events globally

On Fri, Aug 02, 2013 at 06:29:55PM +0200, Frederic Weisbecker wrote:
> Freq events may not always be affine to a particular CPU. As such,
> account_event_cpu() may crash if we account per cpu a freq event
> that has event->cpu == -1.
>
> To solve this, lets account freq events globally. In practice
> this doesn't change much the picture because perf tools create
> per-task perf events with one event per CPU by default. Profiling a
> single CPU is usually a corner case so there is no much point in
> optimizing things that way.
>
> Reported-by: Jiri Olsa <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Stephane Eranian <[email protected]>

no more OOPSes ;-)

Tested-by: Jiri Olsa <[email protected]

Subject: [tip:perf/core] perf: Roll back callchain buffer refcount under the callchain mutex

Commit-ID: fc3b86d673e41ac66b4ba5b75a90c2fcafb90089
Gitweb: http://git.kernel.org/tip/fc3b86d673e41ac66b4ba5b75a90c2fcafb90089
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 2 Aug 2013 18:29:54 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 16 Aug 2013 17:55:50 +0200

perf: Roll back callchain buffer refcount under the callchain mutex

When we fail to allocate the callchain buffers, we roll back the refcount
we did and return from get_callchain_buffers().

However we take the refcount and allocate under the callchain lock
but the rollback is done outside the lock.

As a result, while we roll back, some concurrent callchain user may
call get_callchain_buffers(), see the non-zero refcount and give up
because the buffers are NULL without itself retrying the allocation.

The consequences aren't that bad but that behaviour looks weird enough and
it's better to give their chances to the following callchain users where
we failed.

Reported-by: Jiri Olsa <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/callchain.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 76a8bc5..97b67df 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -116,10 +116,11 @@ int get_callchain_buffers(void)

err = alloc_callchain_buffers();
exit:
- mutex_unlock(&callchain_mutex);
if (err)
atomic_dec(&nr_callchain_events);

+ mutex_unlock(&callchain_mutex);
+
return err;
}

Subject: [tip:perf/core] perf: Account freq events globally

Commit-ID: 948b26b6ddd08a57cb95ebb0dc96fde2edd5c383
Gitweb: http://git.kernel.org/tip/948b26b6ddd08a57cb95ebb0dc96fde2edd5c383
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 2 Aug 2013 18:29:55 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 16 Aug 2013 17:55:51 +0200

perf: Account freq events globally

Freq events may not always be affine to a particular CPU. As such,
account_event_cpu() may crash if we account per cpu a freq event
that has event->cpu == -1.

To solve this, lets account freq events globally. In practice
this doesn't change much the picture because perf tools create
per-task perf events with one event per CPU by default. Profiling a
single CPU is usually a corner case so there is no much point in
optimizing things that way.

Reported-by: Jiri Olsa <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/core.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e82e700..2e675e8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -141,11 +141,11 @@ enum event_type_t {
struct static_key_deferred perf_sched_events __read_mostly;
static DEFINE_PER_CPU(atomic_t, perf_cgroup_events);
static DEFINE_PER_CPU(atomic_t, perf_branch_stack_events);
-static DEFINE_PER_CPU(atomic_t, perf_freq_events);

static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
+static atomic_t nr_freq_events __read_mostly;

static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
@@ -1871,9 +1871,6 @@ static int __perf_install_in_context(void *info)
perf_pmu_enable(cpuctx->ctx.pmu);
perf_ctx_unlock(cpuctx, task_ctx);

- if (atomic_read(&__get_cpu_var(perf_freq_events)))
- tick_nohz_full_kick();
-
return 0;
}

@@ -2811,7 +2808,7 @@ done:
#ifdef CONFIG_NO_HZ_FULL
bool perf_event_can_stop_tick(void)
{
- if (atomic_read(&__get_cpu_var(perf_freq_events)) ||
+ if (atomic_read(&nr_freq_events) ||
__this_cpu_read(perf_throttled_count))
return false;
else
@@ -3140,9 +3137,6 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
}
if (is_cgroup_event(event))
atomic_dec(&per_cpu(perf_cgroup_events, cpu));
-
- if (event->attr.freq)
- atomic_dec(&per_cpu(perf_freq_events, cpu));
}

static void unaccount_event(struct perf_event *event)
@@ -3158,6 +3152,8 @@ static void unaccount_event(struct perf_event *event)
atomic_dec(&nr_comm_events);
if (event->attr.task)
atomic_dec(&nr_task_events);
+ if (event->attr.freq)
+ atomic_dec(&nr_freq_events);
if (is_cgroup_event(event))
static_key_slow_dec_deferred(&perf_sched_events);
if (has_branch_stack(event))
@@ -6489,9 +6485,6 @@ static void account_event_cpu(struct perf_event *event, int cpu)
}
if (is_cgroup_event(event))
atomic_inc(&per_cpu(perf_cgroup_events, cpu));
-
- if (event->attr.freq)
- atomic_inc(&per_cpu(perf_freq_events, cpu));
}

static void account_event(struct perf_event *event)
@@ -6507,6 +6500,10 @@ static void account_event(struct perf_event *event)
atomic_inc(&nr_comm_events);
if (event->attr.task)
atomic_inc(&nr_task_events);
+ if (event->attr.freq) {
+ if (atomic_inc_return(&nr_freq_events) == 1)
+ tick_nohz_full_kick_all();
+ }
if (has_branch_stack(event))
static_key_slow_inc(&perf_sched_events.key);
if (is_cgroup_event(event))

Subject: [tip:timers/nohz] nohz: Include local CPU in full dynticks global kick

Commit-ID: c2e7fcf53c3cb02b4ada1c66a9bc8a4d97d58aba
Gitweb: http://git.kernel.org/tip/c2e7fcf53c3cb02b4ada1c66a9bc8a4d97d58aba
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 2 Aug 2013 18:29:56 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 16 Aug 2013 17:55:33 +0200

nohz: Include local CPU in full dynticks global kick

tick_nohz_full_kick_all() is useful to notify all full dynticks
CPUs that there is a system state change to checkout before
re-evaluating the need for the tick.

Unfortunately this is implemented using smp_call_function_many()
that ignores the local CPU. This CPU also needs to re-evaluate
the tick.

on_each_cpu_mask() is not useful either because we don't want to
re-evaluate the tick state in place but asynchronously from an IPI
to avoid messing up with any random locking scenario.

So lets call tick_nohz_full_kick() from tick_nohz_full_kick_all()
so that the usual irq work takes care of it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Kevin Hilman <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/time/tick-sched.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index adea6fc3..3612fc7 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -246,6 +246,7 @@ void tick_nohz_full_kick_all(void)
preempt_disable();
smp_call_function_many(tick_nohz_full_mask,
nohz_full_kick_ipi, NULL, false);
+ tick_nohz_full_kick();
preempt_enable();
}