2015-02-05 23:55:44

by Shaohua Li

[permalink] [raw]
Subject: [PATCH 1/2 --resend] perf: update shadow timestamp before add event

Last post appears lost, so I repost to check if there are any comments.

Update the shadow timestamp before start event. .add might use the
timestamp.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
---
kernel/events/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 19efcf1..04d8b48 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1769,6 +1769,10 @@ event_sched_in(struct perf_event *event,

perf_pmu_disable(event->pmu);

+ event->tstamp_running += tstamp - event->tstamp_stopped;
+
+ perf_set_shadow_time(event, ctx, tstamp);
+
if (event->pmu->add(event, PERF_EF_START)) {
event->state = PERF_EVENT_STATE_INACTIVE;
event->oncpu = -1;
@@ -1776,10 +1780,6 @@ event_sched_in(struct perf_event *event,
goto out;
}

- event->tstamp_running += tstamp - event->tstamp_stopped;
-
- perf_set_shadow_time(event, ctx, tstamp);
-
if (!is_software_event(event))
cpuctx->active_oncpu++;
ctx->nr_active++;
--
1.8.1


2015-02-05 23:55:39

by Shaohua Li

[permalink] [raw]
Subject: [PATCH 2/2 --resend] perf: update userspace page info for software event

For hardware event, the userspace page of the event gets updated in
context switch, so if we read time in the page, we get updated info. For
software event, this is missed currently. This patch makes the behavior
consistency.

With this patch, we can implement clock_gettime(THREAD_CPUTIME) with
PERF_COUNT_SW_DUMMY in userspace as suggested by Andy and Peter. Code
likes this:

if (pc->cap_user_time) {
do {
seq = pc->lock;
barrier();

running = pc->time_running;
cyc = rdtsc();
time_mult = pc->time_mult;
time_shift = pc->time_shift;
time_offset = pc->time_offset;

barrier();
} while (pc->lock != seq);

quot = (cyc >> time_shift);
rem = cyc & ((1 << time_shift) - 1);
delta = time_offset + quot * time_mult +
((rem * time_mult) >> time_shift);

running += delta;
return running;
}

I tried in a busy system, the userspace page updating hasn't noticeable
overhead.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
---
kernel/events/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 04d8b48..98105cf 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5950,6 +5950,7 @@ static int perf_swevent_add(struct perf_event *event, int flags)
}

hlist_add_head_rcu(&event->hlist_entry, head);
+ perf_event_update_userpage(event);

return 0;
}
@@ -6419,6 +6420,7 @@ static int cpu_clock_event_add(struct perf_event *event, int flags)
{
if (flags & PERF_EF_START)
cpu_clock_event_start(event, flags);
+ perf_event_update_userpage(event);

return 0;
}
@@ -6493,6 +6495,7 @@ static int task_clock_event_add(struct perf_event *event, int flags)
{
if (flags & PERF_EF_START)
task_clock_event_start(event, flags);
+ perf_event_update_userpage(event);

return 0;
}
--
1.8.1

2015-02-11 11:15:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2 --resend] perf: update userspace page info for software event

On Thu, Feb 05, 2015 at 03:55:32PM -0800, Shaohua Li wrote:
> For hardware event, the userspace page of the event gets updated in
> context switch, so if we read time in the page, we get updated info. For
> software event, this is missed currently. This patch makes the behavior
> consistency.
>
> With this patch, we can implement clock_gettime(THREAD_CPUTIME) with
> PERF_COUNT_SW_DUMMY in userspace as suggested by Andy and Peter. Code
> likes this:
>
> if (pc->cap_user_time) {
> do {
> seq = pc->lock;
> barrier();
>
> running = pc->time_running;
> cyc = rdtsc();
> time_mult = pc->time_mult;
> time_shift = pc->time_shift;
> time_offset = pc->time_offset;
>
> barrier();
> } while (pc->lock != seq);
>
> quot = (cyc >> time_shift);
> rem = cyc & ((1 << time_shift) - 1);
> delta = time_offset + quot * time_mult +
> ((rem * time_mult) >> time_shift);

You could maybe use:

static inline u64 mul_u64_u32_shr(u64 a, u32 mul, unsigned int shift)
{
return (u64)(((unsigned __int128)a * mul) >> shift);
}

And save yourself a mult instruction if you have suitable (64bit)
hardware and a recent GCC.

> running += delta;
> return running;
> }
>

Thanks for poking me. Applied.

Subject: [tip:perf/core] perf: Update shadow timestamp before add event

Commit-ID: 72f669c0086fbbbbebc92ce7390125722c4c0ec5
Gitweb: http://git.kernel.org/tip/72f669c0086fbbbbebc92ce7390125722c4c0ec5
Author: Shaohua Li <[email protected]>
AuthorDate: Thu, 5 Feb 2015 15:55:31 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 18 Feb 2015 17:01:44 +0100

perf: Update shadow timestamp before add event

Update the shadow timestamp before start event, because .add might
use the timestamp.

Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul Mackerras <[email protected]>
Link: http://lkml.kernel.org/r/9cd0276d6a047cb7c2885994f25e3a1f7c8c28af.1423180257.git.shli@fb.com
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 13209a9..e580e0f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1881,6 +1881,10 @@ event_sched_in(struct perf_event *event,

perf_pmu_disable(event->pmu);

+ event->tstamp_running += tstamp - event->tstamp_stopped;
+
+ perf_set_shadow_time(event, ctx, tstamp);
+
if (event->pmu->add(event, PERF_EF_START)) {
event->state = PERF_EVENT_STATE_INACTIVE;
event->oncpu = -1;
@@ -1888,10 +1892,6 @@ event_sched_in(struct perf_event *event,
goto out;
}

- event->tstamp_running += tstamp - event->tstamp_stopped;
-
- perf_set_shadow_time(event, ctx, tstamp);
-
if (!is_software_event(event))
cpuctx->active_oncpu++;
if (!ctx->nr_active++)

Subject: [tip:perf/core] perf: Update userspace page info for software event

Commit-ID: 6a694a607a97d58c042fb7fbd60ef1caea26950c
Gitweb: http://git.kernel.org/tip/6a694a607a97d58c042fb7fbd60ef1caea26950c
Author: Shaohua Li <[email protected]>
AuthorDate: Thu, 5 Feb 2015 15:55:32 -0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 18 Feb 2015 17:01:45 +0100

perf: Update userspace page info for software event

For hardware events, the userspace page of the event gets updated in
context switches, so if we read the timestamp in the page, we get
fresh info.

For software events, this is missing currently. This patch makes the
behavior consistent.

With this patch, we can implement clock_gettime(THREAD_CPUTIME) with
PERF_COUNT_SW_DUMMY in userspace as suggested by Andy and Peter. Code
like this:

if (pc->cap_user_time) {
do {
seq = pc->lock;
barrier();

running = pc->time_running;
cyc = rdtsc();
time_mult = pc->time_mult;
time_shift = pc->time_shift;
time_offset = pc->time_offset;

barrier();
} while (pc->lock != seq);

quot = (cyc >> time_shift);
rem = cyc & ((1 << time_shift) - 1);
delta = time_offset + quot * time_mult +
((rem * time_mult) >> time_shift);

running += delta;
return running;
}

I tried it on a busy system, the userspace page updating doesn't
have noticeable overhead.

Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/aa2dd2e4f1e9f2225758be5ba00f14d6909a8ce1.1423180257.git.shli@fb.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e580e0f..fef45b4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6123,6 +6123,7 @@ static int perf_swevent_add(struct perf_event *event, int flags)
}

hlist_add_head_rcu(&event->hlist_entry, head);
+ perf_event_update_userpage(event);

return 0;
}
@@ -6592,6 +6593,7 @@ static int cpu_clock_event_add(struct perf_event *event, int flags)
{
if (flags & PERF_EF_START)
cpu_clock_event_start(event, flags);
+ perf_event_update_userpage(event);

return 0;
}
@@ -6666,6 +6668,7 @@ static int task_clock_event_add(struct perf_event *event, int flags)
{
if (flags & PERF_EF_START)
task_clock_event_start(event, flags);
+ perf_event_update_userpage(event);

return 0;
}