2011-06-23 20:34:55

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 1/3] events: Add note to update_event_times comment about holding ctx->lock

Signed-off-by: Eric B Munson <[email protected]>
---
kernel/events/core.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9efe710..a054964 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -731,6 +731,7 @@ static u64 perf_event_time(struct perf_event *event)

/*
* Update the total_time_enabled and total_time_running fields for a event.
+ * The caller of this function needs to hold the ctx->lock.
*/
static void update_event_times(struct perf_event *event)
{
--
1.7.4.1


2011-06-23 20:34:54

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 2/3] events: Move lockless timer calculation into helper function

Take the timer calculation from perf_output_read and move it to a helper
function for any place that needs timer values but cannot take the ctx->lock.

Signed-off-by: Eric B Munson <[email protected]>
---
kernel/events/core.c | 22 +++++++++++++++-------
1 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index a054964..9e9a7fa 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3359,6 +3359,18 @@ static int perf_event_index(struct perf_event *event)
return event->hw.idx + 1 - PERF_EVENT_INDEX_OFFSET;
}

+static void calc_timer_values(struct perf_event *event,
+ u64 *running,
+ u64 *enabled)
+{
+ u64 now, ctx_time;
+
+ now = perf_clock();
+ ctx_time = event->shadow_ctx_time + now;
+ *enabled = ctx_time - event->tstamp_enabled;
+ *running = ctx_time - event->tstamp_running;
+}
+
/*
* Callers need to ensure there can be no nesting of this function, otherwise
* the seqlock logic goes bad. We can not serialize this because the arch
@@ -4250,7 +4262,7 @@ static void perf_output_read_group(struct perf_output_handle *handle,
static void perf_output_read(struct perf_output_handle *handle,
struct perf_event *event)
{
- u64 enabled = 0, running = 0, now, ctx_time;
+ u64 enabled = 0, running = 0;
u64 read_format = event->attr.read_format;

/*
@@ -4262,12 +4274,8 @@ static void perf_output_read(struct perf_output_handle *handle,
* because of locking issue as we are called in
* NMI context
*/
- if (read_format & PERF_FORMAT_TOTAL_TIMES) {
- now = perf_clock();
- ctx_time = event->shadow_ctx_time + now;
- enabled = ctx_time - event->tstamp_enabled;
- running = ctx_time - event->tstamp_running;
- }
+ if (read_format & PERF_FORMAT_TOTAL_TIMES)
+ calc_timer_values(event, &enabled, &running);

if (event->attr.read_format & PERF_FORMAT_GROUP)
perf_output_read_group(handle, event, enabled, running);
--
1.7.4.1

2011-06-23 20:35:38

by Eric B Munson

[permalink] [raw]
Subject: [PATCH 3/3] events: Ensure that timers are updated without requiring read() call

The event tracing infrastructure exposes two timers which should be updated
each time the value of the counter is updated. Currently, these counters are
only updated when userspace calls read() on the fd associated with an event.
This means that counters which are read via the mmap'd page exclusively never
have their timers updated. This patch adds ensures that the timers are updated
each time the values in the mmap'd page are updated.

Signed-off-by: Eric B Munson <[email protected]>
---
kernel/events/core.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9e9a7fa..e3be175 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3382,6 +3382,18 @@ void perf_event_update_userpage(struct perf_event *event)
struct perf_buffer *buffer;

rcu_read_lock();
+ /*
+ * compute total_time_enabled, total_time_running
+ * based on snapshot values taken when the event
+ * was last scheduled in.
+ *
+ * we cannot simply called update_context_time()
+ * because of locking issue as we are called in
+ * NMI context
+ */
+ calc_timer_values(event,
+ &event->total_time_enabled,
+ &event->total_time_running);
buffer = rcu_dereference(event->buffer);
if (!buffer)
goto unlock;
--
1.7.4.1

2011-06-24 09:45:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] events: Ensure that timers are updated without requiring read() call

On Thu, 2011-06-23 at 16:34 -0400, Eric B Munson wrote:
> The event tracing infrastructure exposes two timers which should be updated
> each time the value of the counter is updated. Currently, these counters are
> only updated when userspace calls read() on the fd associated with an event.
> This means that counters which are read via the mmap'd page exclusively never
> have their timers updated. This patch adds ensures that the timers are updated
> each time the values in the mmap'd page are updated.
>
> Signed-off-by: Eric B Munson <[email protected]>
> ---
> kernel/events/core.c | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 9e9a7fa..e3be175 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3382,6 +3382,18 @@ void perf_event_update_userpage(struct perf_event *event)
> struct perf_buffer *buffer;
>
> rcu_read_lock();
> + /*
> + * compute total_time_enabled, total_time_running
> + * based on snapshot values taken when the event
> + * was last scheduled in.
> + *
> + * we cannot simply called update_context_time()
> + * because of locking issue as we are called in

s/are/can be/

> + * NMI context
> + */
> + calc_timer_values(event,
> + &event->total_time_enabled,
> + &event->total_time_running);

I'm not sure writing those from NMI context is a sane thing to do, best
is to compute the values into a local variable and use that variable
below.

Took the first two patches.

> buffer = rcu_dereference(event->buffer);
> if (!buffer)
> goto unlock;

2011-06-24 09:47:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] events: Ensure that timers are updated without requiring read() call

On Fri, 2011-06-24 at 11:44 +0200, Peter Zijlstra wrote:
> > + calc_timer_values(event,
> > + &event->total_time_enabled,
> > + &event->total_time_running);
>
> I'm not sure writing those from NMI context is a sane thing to do, best
> is to compute the values into a local variable and use that variable
> below.

To clarify, on 32bit architectures the NMI might come in the middle of
writing the two words of one of those, writing them again from the NMI
handler will result in overlapping writes, which might lead to some
weird end-results.

2011-06-24 12:44:25

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 3/3] events: Ensure that timers are updated without requiring read() call

On Fri, 24 Jun 2011, Peter Zijlstra wrote:

> On Thu, 2011-06-23 at 16:34 -0400, Eric B Munson wrote:
> > The event tracing infrastructure exposes two timers which should be updated
> > each time the value of the counter is updated. Currently, these counters are
> > only updated when userspace calls read() on the fd associated with an event.
> > This means that counters which are read via the mmap'd page exclusively never
> > have their timers updated. This patch adds ensures that the timers are updated
> > each time the values in the mmap'd page are updated.
> >
> > Signed-off-by: Eric B Munson <[email protected]>
> > ---
> > kernel/events/core.c | 12 ++++++++++++
> > 1 files changed, 12 insertions(+), 0 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 9e9a7fa..e3be175 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3382,6 +3382,18 @@ void perf_event_update_userpage(struct perf_event *event)
> > struct perf_buffer *buffer;
> >
> > rcu_read_lock();
> > + /*
> > + * compute total_time_enabled, total_time_running
> > + * based on snapshot values taken when the event
> > + * was last scheduled in.
> > + *
> > + * we cannot simply called update_context_time()
> > + * because of locking issue as we are called in
>
> s/are/can be/
>
> > + * NMI context
> > + */
> > + calc_timer_values(event,
> > + &event->total_time_enabled,
> > + &event->total_time_running);
>
> I'm not sure writing those from NMI context is a sane thing to do, best
> is to compute the values into a local variable and use that variable
> below.
>
> Took the first two patches.
>

Thanks, I will send out an updated patch once I get it tested.

Eric


Attachments:
(No filename) (1.70 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-06-24 12:49:26

by Eric B Munson

[permalink] [raw]
Subject: Re: [PATCH 3/3] events: Ensure that timers are updated without requiring read() call

On Fri, 24 Jun 2011, Peter Zijlstra wrote:

> On Thu, 2011-06-23 at 16:34 -0400, Eric B Munson wrote:
> > The event tracing infrastructure exposes two timers which should be updated
> > each time the value of the counter is updated. Currently, these counters are
> > only updated when userspace calls read() on the fd associated with an event.
> > This means that counters which are read via the mmap'd page exclusively never
> > have their timers updated. This patch adds ensures that the timers are updated
> > each time the values in the mmap'd page are updated.
> >
> > Signed-off-by: Eric B Munson <[email protected]>
> > ---
> > kernel/events/core.c | 12 ++++++++++++
> > 1 files changed, 12 insertions(+), 0 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 9e9a7fa..e3be175 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -3382,6 +3382,18 @@ void perf_event_update_userpage(struct perf_event *event)
> > struct perf_buffer *buffer;
> >
> > rcu_read_lock();
> > + /*
> > + * compute total_time_enabled, total_time_running
> > + * based on snapshot values taken when the event
> > + * was last scheduled in.
> > + *
> > + * we cannot simply called update_context_time()
> > + * because of locking issue as we are called in
>
> s/are/can be/
>
> > + * NMI context
> > + */
> > + calc_timer_values(event,
> > + &event->total_time_enabled,
> > + &event->total_time_running);
>
> I'm not sure writing those from NMI context is a sane thing to do, best
> is to compute the values into a local variable and use that variable
> below.
>
> Took the first two patches.
>

Now that I think about it, this will just mask the problem. I have a test
program uses the mmap'd user space page to access event counters (it never
calls read()). In this case, the timer values in the event will never be
updated. It will display "properly" but the structure won't ever be correct.
Given that, how can we keep the event values current?

Eric


Attachments:
(No filename) (1.98 kB)
signature.asc (490.00 B)
Digital signature
Download all attachments

2011-06-27 09:37:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/3] events: Ensure that timers are updated without requiring read() call

On Fri, 2011-06-24 at 08:49 -0400, Eric B Munson wrote:

> Now that I think about it, this will just mask the problem. I have a test
> program uses the mmap'd user space page to access event counters (it never
> calls read()). In this case, the timer values in the event will never be
> updated. It will display "properly" but the structure won't ever be correct.
> Given that, how can we keep the event values current?

Well the idea is that you do a userspace read of the counter, on x86
that would be using rdpmc and use the provided event count as base. See
the comment in struct perf_event_mmap_page.

Currently we don't have rdpmc support for x86, but it shouldn't be hard.
We should poke at CR4 in our CPU_STARTING callback, and fix up the mess
called perf_event_index() to deal with the strange and wonderful
encoding rdpmc needs (might want an event->pmu->index callback or so).

Currently all this only works on PowerPC.