2009-12-09 20:40:12

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH] perf sched: Add max delay time snapshot

When we have a maximum latency reported for a task, we need a
convenient way to find the matching location to the raw traces
or to perf sched map that shows where the task has been eventually
scheduled in. This gives a pointer to retrieve the events that occured
during this max latency.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index b12b23a..7cca7c1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -140,6 +140,7 @@ struct work_atoms {
struct thread *thread;
struct rb_node node;
u64 max_lat;
+ u64 max_lat_at;
u64 total_lat;
u64 nb_atoms;
u64 total_runtime;
@@ -1013,8 +1014,10 @@ add_sched_in_event(struct work_atoms *atoms, u64 timestamp)

delta = atom->sched_in_time - atom->wake_up_time;
atoms->total_lat += delta;
- if (delta > atoms->max_lat)
+ if (delta > atoms->max_lat) {
atoms->max_lat = delta;
+ atoms->max_lat_at = timestamp;
+ }
atoms->nb_atoms++;
}

@@ -1210,10 +1213,11 @@ static void output_lat_thread(struct work_atoms *work_list)

avg = work_list->total_lat / work_list->nb_atoms;

- printf("|%11.3f ms |%9llu | avg:%9.3f ms | max:%9.3f ms |\n",
+ printf("|%11.3f ms |%9llu | avg:%9.3f ms | max:%9.3f ms | max at: %9.6f s\n",
(double)work_list->total_runtime / 1e6,
work_list->nb_atoms, (double)avg / 1e6,
- (double)work_list->max_lat / 1e6);
+ (double)work_list->max_lat / 1e6,
+ (double)work_list->max_lat_at / 1e9);
}

static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
@@ -1704,9 +1708,9 @@ static void __cmd_lat(void)
read_events();
sort_lat();

- printf("\n -----------------------------------------------------------------------------------------\n");
- printf(" Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |\n");
- printf(" -----------------------------------------------------------------------------------------\n");
+ printf("\n ---------------------------------------------------------------------------------------------------------------\n");
+ printf(" Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |\n");
+ printf(" ---------------------------------------------------------------------------------------------------------------\n");

next = rb_first(&sorted_atom_root);

--
1.6.2.3


2009-12-10 03:26:52

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH] perf sched: Add max delay time snapshot



Frederic Weisbecker wrote:
> When we have a maximum latency reported for a task, we need a
> convenient way to find the matching location to the raw traces
> or to perf sched map that shows where the task has been eventually
> scheduled in. This gives a pointer to retrieve the events that occured
> during this max latency.
>

Then, we can cooperate with ftrace's data to know what the cpu is
doing at that time.

Reviewed-by: Xiao Guangrong <[email protected]>

Thanks,
Xiao

2009-12-10 07:24:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] perf sched: Add max delay time snapshot


* Xiao Guangrong <[email protected]> wrote:

> Frederic Weisbecker wrote:
>
> > When we have a maximum latency reported for a task, we need a
> > convenient way to find the matching location to the raw traces or to
> > perf sched map that shows where the task has been eventually
> > scheduled in. This gives a pointer to retrieve the events that
> > occured during this max latency.
>
> Then, we can cooperate with ftrace's data to know what the cpu is
> doing at that time.

What do you mean by mixing it with ftrace data? These events ought to be
a full replacement for the sched and wakeup tracers. In the long run we
want a single stream of events and phase out most of the pretty-printing
ftrace plugins.

Ingo

2009-12-10 07:51:05

by Frederic Weisbecker

[permalink] [raw]
Subject: [tip:perf/urgent] perf sched: Add max delay time snapshot

Commit-ID: 3786310afe738070be31c439b8deeaeb69b9735d
Gitweb: http://git.kernel.org/tip/3786310afe738070be31c439b8deeaeb69b9735d
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Wed, 9 Dec 2009 21:40:08 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Dec 2009 08:30:26 +0100

perf sched: Add max delay time snapshot

When we have a maximum latency reported for a task, we need a
convenient way to find the matching location to the raw traces
or to perf sched map that shows where the task has been
eventually scheduled in. This gives a pointer to retrieve the
events that occured during this max latency.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Xiao Guangrong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Paul Mackerras <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/builtin-sched.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index b12b23a..7cca7c1 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -140,6 +140,7 @@ struct work_atoms {
struct thread *thread;
struct rb_node node;
u64 max_lat;
+ u64 max_lat_at;
u64 total_lat;
u64 nb_atoms;
u64 total_runtime;
@@ -1013,8 +1014,10 @@ add_sched_in_event(struct work_atoms *atoms, u64 timestamp)

delta = atom->sched_in_time - atom->wake_up_time;
atoms->total_lat += delta;
- if (delta > atoms->max_lat)
+ if (delta > atoms->max_lat) {
atoms->max_lat = delta;
+ atoms->max_lat_at = timestamp;
+ }
atoms->nb_atoms++;
}

@@ -1210,10 +1213,11 @@ static void output_lat_thread(struct work_atoms *work_list)

avg = work_list->total_lat / work_list->nb_atoms;

- printf("|%11.3f ms |%9llu | avg:%9.3f ms | max:%9.3f ms |\n",
+ printf("|%11.3f ms |%9llu | avg:%9.3f ms | max:%9.3f ms | max at: %9.6f s\n",
(double)work_list->total_runtime / 1e6,
work_list->nb_atoms, (double)avg / 1e6,
- (double)work_list->max_lat / 1e6);
+ (double)work_list->max_lat / 1e6,
+ (double)work_list->max_lat_at / 1e9);
}

static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
@@ -1704,9 +1708,9 @@ static void __cmd_lat(void)
read_events();
sort_lat();

- printf("\n -----------------------------------------------------------------------------------------\n");
- printf(" Task | Runtime ms | Switches | Average delay ms | Maximum delay ms |\n");
- printf(" -----------------------------------------------------------------------------------------\n");
+ printf("\n ---------------------------------------------------------------------------------------------------------------\n");
+ printf(" Task | Runtime ms | Switches | Average delay ms | Maximum delay ms | Maximum delay at |\n");
+ printf(" ---------------------------------------------------------------------------------------------------------------\n");

next = rb_first(&sorted_atom_root);

2009-12-10 08:17:23

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH] perf sched: Add max delay time snapshot



Ingo Molnar wrote:
> * Xiao Guangrong <[email protected]> wrote:
>
>> Frederic Weisbecker wrote:
>>
>>> When we have a maximum latency reported for a task, we need a
>>> convenient way to find the matching location to the raw traces or to
>>> perf sched map that shows where the task has been eventually
>>> scheduled in. This gives a pointer to retrieve the events that
>>> occured during this max latency.
>> Then, we can cooperate with ftrace's data to know what the cpu is
>> doing at that time.
>
> What do you mean by mixing it with ftrace data? These events ought to be
> a full replacement for the sched and wakeup tracers. In the long run we
> want a single stream of events and phase out most of the pretty-printing
> ftrace plugins.

Hi Ingo,

I think sometimes perf tool cooperate with ftrace can do more
useful things, take this case for example:

By 'perf sched latency' we can get the schedule latency time,
if the time is abnormal, then we can run this command and enable
function tracer.

After running, 'perf sched latency' can show us the timestamps
when the maximum latency(the worst case) occurs, then we can find
what the cpu is doing at this timestamps by reading function
tracer's output.

Thanks,
Xiao

2009-12-10 08:27:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] perf sched: Add max delay time snapshot


* Xiao Guangrong <[email protected]> wrote:

>
>
> Ingo Molnar wrote:
> > * Xiao Guangrong <[email protected]> wrote:
> >
> >> Frederic Weisbecker wrote:
> >>
> >>> When we have a maximum latency reported for a task, we need a
> >>> convenient way to find the matching location to the raw traces or to
> >>> perf sched map that shows where the task has been eventually
> >>> scheduled in. This gives a pointer to retrieve the events that
> >>> occured during this max latency.
> >> Then, we can cooperate with ftrace's data to know what the cpu is
> >> doing at that time.
> >
> > What do you mean by mixing it with ftrace data? These events ought to be
> > a full replacement for the sched and wakeup tracers. In the long run we
> > want a single stream of events and phase out most of the pretty-printing
> > ftrace plugins.
>
> Hi Ingo,
>
> I think sometimes perf tool cooperate with ftrace can do more useful
> things, take this case for example:
>
> By 'perf sched latency' we can get the schedule latency time, if the
> time is abnormal, then we can run this command and enable function
> tracer.
>
> After running, 'perf sched latency' can show us the timestamps when
> the maximum latency(the worst case) occurs, then we can find what the
> cpu is doing at this timestamps by reading function tracer's output.

Actually, i think the natural solution here is not any ugly interaction
between two largely disjunct sets of APIs, but a new feature: to turn
the function tracer into an event.

That would allow perf sched to also record function traces if so
desired. And it would also allow a whole lot of other things - mixing
function tracer events and other events.

As a starter we could create a new function tracer event. A crude
prototype hack is attached below - via that it should already be
possible to 'count' function calls via:

perf stat -a --repeat 3 -e context-switches sleep 1

( obviously the real patch would introduce PERF_COUNT_SW_FUNCTION_CALLS,
but you get the point. )

Ingo

---
include/trace/events/function.h | 33 +++++++++++++++++++++++++++++++++
kernel/trace/ftrace.c | 15 +++++++++++++++
2 files changed, 48 insertions(+)

Index: linux/include/trace/events/function.h
===================================================================
--- /dev/null
+++ linux/include/trace/events/function.h
@@ -0,0 +1,33 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM function
+
+#if !defined(_TRACE_FUNCTION_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FUNCTION_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(function_call,
+
+ TP_PROTO(unsigned long ip, unsigned long parent_ip),
+
+ TP_ARGS(ip, parent_ip),
+
+ TP_STRUCT__entry(
+ __field( u64, ip )
+ __field( u64, parent_ip )
+ ),
+
+ TP_fast_assign(
+ __entry->ip = ip;
+ __entry->parent_ip = parent_ip;
+ ),
+
+ TP_printk("IP: %016Lx, parent IP: %016Lx",
+ __entry->ip,
+ __entry->parent_ip)
+);
+
+#endif /* _TRACE_FUNCTION_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
Index: linux/kernel/trace/ftrace.c
===================================================================
--- linux.orig/kernel/trace/ftrace.c
+++ linux/kernel/trace/ftrace.c
@@ -2769,9 +2769,24 @@ void __init ftrace_init(void)

#else

+#define CREATE_TRACE_POINTS
+#include <trace/events/function.h>
+
+static void perf_ftrace_func(unsigned long ip, unsigned long parent_ip)
+{
+ struct pt_regs *regs = task_pt_regs(current);
+
+ perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, regs, 0);
+}
+
static int __init ftrace_nodyn_init(void)
{
ftrace_enabled = 1;
+
+ printk("enabling function tracer test\n");
+
+ ftrace_trace_function = perf_ftrace_func;
+
return 0;
}
device_initcall(ftrace_nodyn_init);

2009-12-10 08:47:49

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH] perf sched: Add max delay time snapshot



Ingo Molnar wrote:

> Actually, i think the natural solution here is not any ugly interaction
> between two largely disjunct sets of APIs, but a new feature: to turn
> the function tracer into an event.
>
> That would allow perf sched to also record function traces if so
> desired. And it would also allow a whole lot of other things - mixing
> function tracer events and other events.

It's a useful feature! :-)

Thanks,
Xiao