2009-12-15 11:18:53

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior

Hi,

We introduce 'perf timer' in this patchset, it can analyze timer
latency and timer function handle time, the usage and result is
like below:

# perf timer record
# perf timer lat --print-lat --print-handle
-------------------------------------------------------------------------------------------------------
| Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
|0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
|0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
|0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
|main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
|main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |

......

-------------------------------------------------------------------------------------------------------
| Timer | TYPE | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
|0xf7ad1f5c |hrtimer |0.025 |0.025 |10270.129 |0xc016b5b0 |
|0xf7903f04 |timer |0.009 |0.011 |10260.342 |0xc0159240 |
|0xf787a05c |hrtimer |0.031 |0.062 |10267.018 |0xc014cc40 |

And, in current code, it'll complain with below message when we use
'perf timer lat':

# ./perf timer lat
Warning: unknown op '{'
Warning: Error: expected type 5 but read 1
Warning: failed to read event print fmt for hrtimer_start
Warning: unknown op '{'
Warning: Error: expected type 5 but read 1
Warning: failed to read event print fmt for hrtimer_expire_entry

It's because perf parse "hrtimer_start" and "hrtimer_expire_entry" fail,
but it not hurt using.

See: http://lkml.org/lkml/2009/10/12/726

include/trace/events/timer.h | 8
tools/perf/Documentation/perf-timer.txt | 40 +
tools/perf/Makefile | 1
tools/perf/builtin-sched.c | 3
tools/perf/builtin-timer.c | 954 ++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1
tools/perf/command-list.txt | 1
tools/perf/perf.c | 1
tools/perf/util/trace-event-parse.c | 25
tools/perf/util/trace-event.h | 4
10 files changed, 1030 insertions(+), 8 deletions(-)


2009-12-15 11:20:57

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 1/4] trace_event: record task' real_timer in itimer_state tracepoint

When itimer is ITIMER_REAL, we should cooperate with hrtimer's
tracepoit to get it's latency, this patch can let us get itimer's
hrtimer easy

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/trace/events/timer.h | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 9496b96..13ec15a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -285,6 +285,7 @@ TRACE_EVENT(itimer_state,

TP_STRUCT__entry(
__field( int, which )
+ __field( void *, timer )
__field( cputime_t, expires )
__field( long, value_sec )
__field( long, value_usec )
@@ -294,6 +295,8 @@ TRACE_EVENT(itimer_state,

TP_fast_assign(
__entry->which = which;
+ __entry->timer = which == ITIMER_REAL ?
+ &current->signal->real_timer : NULL;
__entry->expires = expires;
__entry->value_sec = value->it_value.tv_sec;
__entry->value_usec = value->it_value.tv_usec;
@@ -301,8 +304,9 @@ TRACE_EVENT(itimer_state,
__entry->interval_usec = value->it_interval.tv_usec;
),

- TP_printk("which=%d expires=%llu it_value=%ld.%ld it_interval=%ld.%ld",
- __entry->which, (unsigned long long)__entry->expires,
+ TP_printk("which=%d timer=%p expires=%llu it_value=%ld.%ld it_interval="
+ "%ld.%ld", __entry->which, __entry->timer,
+ (unsigned long long)__entry->expires,
__entry->value_sec, __entry->value_usec,
__entry->interval_sec, __entry->interval_usec)
);
--
1.6.1.2

2009-12-15 11:22:15

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 2/4] perf_event: fix getting point

The point that got by raw_field_ptr() is point to @data offset not
tracepoint saved.

Introduce raw_field_arry() to get arry data is tracepoint save
arry to it.

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 3 +--
tools/perf/util/trace-event-parse.c | 25 ++++++++++++++++++++++---
tools/perf/util/trace-event.h | 3 +++
3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index d67f274..8df982e 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -630,8 +630,7 @@ static void test_calibrations(void)

#define FILL_ARRAY(ptr, array, event, data) \
do { \
- void *__array = raw_field_ptr(event, #array, data); \
- memcpy(ptr.array, __array, sizeof(ptr.array)); \
+ raw_field_arry(event, #array, data, ptr.array, sizeof(ptr.array));\
} while(0)

#define FILL_COMMON_FIELDS(ptr, event, data) \
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index c5c32be..b0ee64f 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
field->flags |= FIELD_IS_ARRAY;

type = read_token(&token);
- while (strcmp(token, "]") != 0) {
+ while (strcmp(token, "]") != 0) {
if (last_type == EVENT_ITEM &&
type == EVENT_ITEM)
len = 2;
@@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)

void *raw_field_ptr(struct event *event, const char *name, void *data)
{
+ void *ptr;
+ unsigned long long value;
+
+ value = raw_field_value(event, name, data);
+
+ if (!value)
+ return NULL;
+
+ memcpy(&ptr, &value, sizeof(ptr));
+ return ptr;
+}
+
+unsigned long long
+raw_field_arry(struct event *event, const char *name, void *data,
+ void *arry, int arry_size)
+{
+ int len;
struct format_field *field;

field = find_any_field(event, name);
if (!field)
- return NULL;
+ return 0ULL;

- return data + field->offset;
+ len = min(arry_size, field->size);
+ memcpy(arry, data + field->offset, len);
+ return len;
}

static int get_common_info(const char *type, int *offset, int *size)
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 81698d5..3f88010 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -255,6 +255,9 @@ unsigned long long read_size(void *ptr, int size);
unsigned long long
raw_field_value(struct event *event, const char *name, void *data);
void *raw_field_ptr(struct event *event, const char *name, void *data);
+unsigned long long
+raw_field_arry(struct event *event, const char *name, void *data,
+ void *arry, int arry_size);
unsigned long long eval_flag(const char *flag);

int read_tracing_data(int fd, struct perf_event_attr *pattrs, int nb_events);
--
1.6.1.2

2009-12-15 11:23:26

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 3/4] perf/timer: add document

Add document for 'perf timer'

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/Documentation/perf-timer.txt | 40 +++++++++++++++++++++++++++++++
1 files changed, 40 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/Documentation/perf-timer.txt

diff --git a/tools/perf/Documentation/perf-timer.txt b/tools/perf/Documentation/perf-timer.txt
new file mode 100644
index 0000000..fbff936
--- /dev/null
+++ b/tools/perf/Documentation/perf-timer.txt
@@ -0,0 +1,40 @@
+perf-timer(1)
+==============
+
+NAME
+----
+perf-timer - Tool to trace/measure kernel timer properties
+
+SYNOPSIS
+--------
+[verse]
+'perf timer' {record|latency} [<options>]
+
+DESCRIPTION
+-----------
+There are two variants of perf timer:
+
+ 'perf timer record <command>' to record the timer events
+ of an arbitrary workload.
+
+ 'perf timer latency' to report kernel timer latency and
+ timer function handle time.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace=::
+ Display verbose dump of the sched data.
+
+-i <file>::
+--input=<file>::
+ Select the input file (default: perf.data)
+
+-s <key[,key2...]>::
+--sort=<key[,key2...]>::
+ Sort the output (default: timer, type, avg-timer-latency,
+ max-timer-latency, avg-handle-time, max-handle-time)
+
+SEE ALSO
+--------
+linkperf:perf-record[1]
--
1.6.1.2

2009-12-15 11:24:53

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 4/4] perf/timer: 'perf timer' core code

It's the core code of 'perf timer', we can use it to analyse timer's
behavior:
# perf timer record
# perf timer lat --print-lat --print-handle
-------------------------------------------------------------------------------------------------------
| Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
|0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
|0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
|0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
|main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
|main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |

......

-------------------------------------------------------------------------------------------------------
| Timer | TYPE | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
|0xf7ad1f5c |hrtimer |0.025 |0.025 |10270.129 |0xc016b5b0 |
|0xf7903f04 |timer |0.009 |0.011 |10260.342 |0xc0159240 |
|0xf787a05c |hrtimer |0.031 |0.062 |10267.018 |0xc014cc40 |

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-timer.c | 954 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
5 files changed, 958 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/builtin-timer.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index a4cb792..797ab53 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -439,6 +439,7 @@ BUILTIN_OBJS += builtin-top.o
BUILTIN_OBJS += builtin-trace.o
BUILTIN_OBJS += builtin-probe.o
BUILTIN_OBJS += builtin-kmem.o
+BUILTIN_OBJS += builtin-timer.o

PERFLIBS = $(LIB_FILE)

diff --git a/tools/perf/builtin-timer.c b/tools/perf/builtin-timer.c
new file mode 100644
index 0000000..d35315f
--- /dev/null
+++ b/tools/perf/builtin-timer.c
@@ -0,0 +1,954 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+
+#include <sys/prctl.h>
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+#ifdef TDEBUG
+static char const *timer_state_string[] = {
+ "TIMER_INIT",
+ "TIMER_STARE",
+ "TIMER_EXPIRE_ENTRY",
+ "TIMER_EXPIRE_EXIT",
+};
+
+#define dprintf(format, args...) printf(format, ## args)
+#else
+#define dprintf(format, args...)
+#endif
+
+#define SORT_KEY \
+ "timer, type, avg-timer-latency, max-timer-latency," \
+ " avg-handle-time, max-handle-time"
+
+static char const *input_name = "perf.data";
+static char default_sort_order[] = SORT_KEY;
+
+static char *sort_order = default_sort_order;
+static int profile_cpu = -1;
+static int print_lat;
+static int print_handle;
+static int bug_nr;
+static struct perf_session *timer_session;
+
+static const struct option timer_options[] = {
+ OPT_STRING('i', "input", &input_name, "file",
+ "input file name"),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show symbol address, etc)"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_END()
+};
+
+static const char * const timer_usage[] = {
+ "perf timer [<options>] {record|latency}",
+ NULL
+};
+
+static const struct option latency_options[] = {
+ OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+ "sort by key(s): "SORT_KEY),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show symbol address, etc)"),
+ OPT_INTEGER('C', "CPU", &profile_cpu,
+ "CPU to profile on"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_BOOLEAN(0, "print-lat", &print_lat, "show timer latency"),
+ OPT_BOOLEAN(0, "print-handle", &print_handle,
+ "show timer function handle time"),
+ OPT_END()
+};
+
+static const char * const latency_usage[] = {
+ "perf timer latency [<options>]",
+ NULL
+};
+
+static const char *record_args[] = {
+ "record",
+ "-a",
+ "-R",
+ "-M",
+ "-f",
+ "-m", "1024",
+ "-c", "1",
+
+ /* timer */
+ "-e", "timer:timer_start",
+ "-e", "timer:timer_expire_entry",
+ "-e", "timer:timer_expire_exit",
+
+ /* hrtimer */
+ "-e", "timer:hrtimer_start",
+ "-e", "timer:hrtimer_expire_entry",
+ "-e", "timer:hrtimer_expire_exit",
+
+ /* itimer **/
+ "-e", "timer:itimer_state",
+ "-e", "timer:itimer_expire",
+};
+
+static LIST_HEAD(default_cmp);
+
+enum timer_type {
+ TIMER,
+ HRTIMER,
+ ITIMER,
+};
+
+static char const *timer_type_string[] = {
+ "timer",
+ "hrtimer",
+ "itimer",
+};
+
+enum timer_state {
+ TIMER_INIT,
+ TIMER_START,
+ TIMER_EXPIRE_ENTRY,
+ TIMER_EXPIRE_EXIT,
+};
+
+struct timer_info {
+ struct rb_node node;
+
+ void *timer;
+ int itimer_type;
+ void *function;
+ enum timer_type type;
+ enum timer_state state;
+ int expire_nr;
+ int bug;
+
+ struct timer_info *itimer_hrtimer;
+
+ u64 timer_last_lat;
+ u64 timer_expire_ts;
+ u64 timer_total_lat;
+ double timer_avg_lat;
+ u64 timer_max_lat;
+ u64 timer_max_lat_at_ts;
+ struct thread *timer_max_lat_at_thread;
+
+ u64 handler_entry_ts;
+ u64 handler_total_time;
+ double handler_avg_time;
+ u64 handler_max_time;
+ u64 handler_max_time_at_ts;
+ void *handler_max_time_at_func;
+};
+
+#define MAX_ITIMER_TYPE (0xFFFF)
+typedef int (*sort_fn_t)(struct timer_info *, struct timer_info *);
+
+struct sort_dimension {
+ const char *name;
+ sort_fn_t cmp;
+ struct list_head list;
+};
+
+#define CMP_FUNCTION(value) \
+static int cmp_##value(struct timer_info *v1, struct timer_info *v2) \
+{ \
+ if (v1->value < v2->value) \
+ return -1; \
+ if (v1->value > v2->value) \
+ return 1; \
+ return 0; \
+}
+
+#define DECLARE_SORT_DIMENSION(sort_name, field) \
+CMP_FUNCTION(field) \
+static struct sort_dimension sort_##field = { \
+ .name = sort_name, \
+ .cmp = cmp_##field, \
+}
+
+DECLARE_SORT_DIMENSION("timer", timer);
+DECLARE_SORT_DIMENSION("type", type);
+DECLARE_SORT_DIMENSION("avg-timer-latency", timer_avg_lat);
+DECLARE_SORT_DIMENSION("max-timer-latency", timer_max_lat);
+DECLARE_SORT_DIMENSION("avg-handle-time", handler_avg_time);
+DECLARE_SORT_DIMENSION("max-handle-time", handler_max_time);
+
+static struct sort_dimension *available_sorts[] = {
+ &sort_timer,
+ &sort_type,
+ &sort_timer_avg_lat,
+ &sort_timer_max_lat,
+ &sort_handler_avg_time,
+ &sort_handler_max_time,
+};
+
+#define NB_AVAILABLE_SORTS (int)(sizeof(available_sorts) / sizeof(struct sort_dimension *))
+static int sort_dimension__add(const char *tok, struct list_head *list)
+{
+ int i;
+
+ for (i = 0; i < NB_AVAILABLE_SORTS; i++) {
+ if (!strcmp(available_sorts[i]->name, tok)) {
+ list_add_tail(&available_sorts[i]->list, list);
+
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static LIST_HEAD(sort_list);
+
+static void setup_sorting(void)
+{
+ char *tmp, *tok, *str = strdup(sort_order);
+
+ for (tok = strtok_r(str, ", ", &tmp);
+ tok; tok = strtok_r(NULL, ", ", &tmp)) {
+ if (sort_dimension__add(tok, &sort_list) < 0) {
+ error("Unknown --sort key: `%s'", tok);
+ usage_with_options(latency_usage, latency_options);
+ }
+ }
+
+ free(str);
+
+ sort_dimension__add("timer", &default_cmp);
+ sort_dimension__add("itimer-type", &default_cmp);
+}
+
+static struct rb_root timer_root;
+
+static int timer_key_cmp(struct list_head *list, struct timer_info *l,
+ struct timer_info *r)
+{
+ struct sort_dimension *sort;
+ int ret = 0;
+
+ BUG_ON(list_empty(list));
+
+ list_for_each_entry(sort, list, list) {
+ ret = sort->cmp(l, r);
+ if (ret)
+ return ret;
+ }
+
+ return ret;
+}
+
+static void timer_traversal(struct rb_root *root,
+ void (*fun)(struct timer_info *))
+{
+ struct rb_node *next;
+
+ next = rb_first(root);
+
+ while (next) {
+ struct timer_info *timer;
+
+ timer = rb_entry(next, struct timer_info, node);
+ fun(timer);
+ next = rb_next(next);
+ }
+}
+
+static struct timer_info *
+__timer_search(struct rb_root *root, struct timer_info *timer,
+ struct list_head *_sort_list)
+{
+ struct rb_node *node = root->rb_node;
+
+ while (node) {
+ struct timer_info *timer_info;
+ int cmp;
+
+ timer_info = container_of(node, struct timer_info, node);
+
+ cmp = timer_key_cmp(_sort_list, timer, timer_info);
+ if (cmp > 0)
+ node = node->rb_left;
+ else if (cmp < 0)
+ node = node->rb_right;
+ else
+ return timer_info;
+ }
+ return NULL;
+}
+
+static struct timer_info *
+timer_search(struct timer_info *timer)
+{
+ return __timer_search(&timer_root, timer, &default_cmp);
+}
+
+static void
+__timer_insert(struct rb_root *root, struct timer_info *data,
+ struct list_head *_sort_list)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ while (*new) {
+ struct timer_info *this;
+ int cmp;
+
+ this = container_of(*new, struct timer_info, node);
+ parent = *new;
+
+ cmp = timer_key_cmp(_sort_list, data, this);
+
+ if (cmp > 0)
+ new = &((*new)->rb_left);
+ else
+ new = &((*new)->rb_right);
+ }
+
+ rb_link_node(&data->node, parent, new);
+ rb_insert_color(&data->node, root);
+}
+
+static void timer_insert(struct timer_info *timer_info)
+{
+ dprintf("timer %p insert...\n", timer_info->timer);
+ __timer_insert(&timer_root, timer_info, &default_cmp);
+}
+
+static struct timer_info
+*__timer_find(void *timer, enum timer_type type, int itimer_type)
+{
+ struct timer_info *find = NULL;
+ struct timer_info timer_info = {
+ .timer = timer,
+ .itimer_type = itimer_type,
+ };
+
+ find = timer_search(&timer_info);
+ if (find && find->type != type) {
+
+ dprintf("find timer[%p], but type[%s] is not we expect[%s],"
+ "set to initializtion state.\n", find->timer,
+ timer_type_string[find->type], timer_type_string[type]);
+
+ find->type = type;
+ find->bug++;
+ find->state = TIMER_INIT;
+ }
+
+ dprintf("find timer[%p] %s\n", timer, find ? "success" : "fail");
+ return find;
+}
+
+static struct timer_info *
+timer_findnew(void *timer, enum timer_type type, int itimer_type)
+{
+ struct timer_info *find;
+
+ find = __timer_find(timer, type, itimer_type);
+ if (find)
+ return find;
+
+ find = zalloc(sizeof(*find));
+ if (!find)
+ die("No memory");
+
+ find->type = type;
+ find->timer = timer;
+ find->itimer_type = itimer_type;
+ timer_insert(find);
+
+ return find;
+}
+
+static struct rb_root timer_result;
+
+static void sort_result(void)
+{
+ struct rb_node *node;
+
+ for (;;) {
+ struct timer_info *data;
+ node = rb_first(&timer_root);
+ if (!node)
+ break;
+
+ rb_erase(node, &timer_root);
+ data = rb_entry(node, struct timer_info, node);
+ data->timer_avg_lat = (double)data->timer_total_lat /
+ (double)data->expire_nr;
+ data->handler_avg_time = (double)data->handler_total_time /
+ (double)data->expire_nr;
+ if (data->expire_nr)
+ __timer_insert(&timer_result, data, &sort_list);
+ }
+}
+
+static int timer_check_set_state(struct timer_info *timer_info,
+ enum timer_state new_state)
+{
+ enum timer_state current_state = timer_info->state;
+ int ret = 1;
+
+ /*
+ * if current state is TIMER_INIT, it means it's the first time we
+ * catched, no need check.
+ */
+ if (current_state == TIMER_INIT) {
+ dprintf("timer[%p] initialize, set state to %s\n",
+ timer_info->timer, timer_state_string[new_state]);
+ ret = 0;
+ }
+
+ timer_info->state = new_state;
+ return ret;
+}
+
+#define FILL_RAM_FIELD_PTR(event, field, data) \
+ field = raw_field_ptr(event, #field, data)
+#define FILLL_RAW_FIELD_VALUE(event, field, data) \
+ field = (typeof(field))raw_field_value(event, #field, data)
+
+static void *get_timer(enum timer_type type, struct event *event, void *data)
+{
+ if (type == HRTIMER) {
+ void *hrtimer = NULL;
+
+ FILL_RAM_FIELD_PTR(event, hrtimer, data);
+ return hrtimer;
+ } else {
+ void *timer = NULL;
+
+ FILL_RAM_FIELD_PTR(event, timer, data);
+ return timer;
+ }
+}
+
+static void
+timer_hrtimer_start(enum timer_type type, void *data, struct event *event)
+{
+ void *timer, *function = NULL;
+ u64 expires;
+ struct timer_info *timer_info;
+
+ timer = get_timer(type, event, data);
+
+ FILL_RAM_FIELD_PTR(event, function, data);
+ FILLL_RAW_FIELD_VALUE(event, expires, data);
+
+ dprintf("timer-start: Timer[%p], type:%s, function:%p, expires:%llu\n",
+ timer, timer_type_string[type], function, expires);
+
+ timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
+ timer_check_set_state(timer_info, TIMER_START);
+ timer_info->function = function;
+ timer_info->timer_expire_ts = expires;
+}
+
+static void
+timer_hrtimer_expires_entry(enum timer_type type, void *data,
+ struct event *event, u64 timestamp,
+ struct thread *thread)
+{
+ void *timer;
+ int ret;
+ u64 now;
+ s64 delta;
+ struct timer_info *timer_info;
+
+ FILLL_RAW_FIELD_VALUE(event, now, data);
+ timer = get_timer(type, event, data);
+ timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
+ timer_info->handler_entry_ts = timestamp;
+ ret = timer_check_set_state(timer_info, TIMER_EXPIRE_ENTRY);
+
+ dprintf("timer-expires-entry: Timer[%p], type:%s, now:%llu, ts:%llu\n",
+ timer, timer_type_string[type], now, timestamp);
+ if (ret == 0)
+ return;
+
+ delta = (s64)(now - timer_info->timer_expire_ts);
+ if (delta < 0) {
+ timer_info->bug++;
+ dprintf("timer-expires-entry: Timer[%p], type:%s,"
+ " timer latency < 0, bug=%d.\n", timer,
+ timer_type_string[type], timer_info->bug);
+ return ;
+ }
+
+ timer_info->expire_nr++;
+ timer_info->timer_last_lat = delta;
+ timer_info->timer_total_lat += delta;
+ if (timer_info->timer_max_lat <= (u64)delta) {
+ timer_info->timer_max_lat = delta;
+ timer_info->timer_max_lat_at_ts = timestamp;
+ timer_info->timer_max_lat_at_thread = thread;
+ }
+ return;
+}
+
+static void
+timer_hrtimer_expires_exit(enum timer_type type, void *data,
+ struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ struct timer_info *timer_info;
+ s64 delta;
+ int ret;
+ void *timer;
+
+ timer = get_timer(type, event, data);
+ timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
+
+ dprintf("timer expires exit: Timer[%p], type:%s, ts:%llu\n",
+ timer, timer_type_string[type], timestamp);
+
+ ret = timer_check_set_state(timer_info, TIMER_EXPIRE_EXIT);
+ if (ret == 0)
+ return;
+
+ delta = timestamp - timer_info->handler_entry_ts;
+ if (delta < 0) {
+ timer_info->bug++;
+ dprintf("timer expires exit: Timer[%p], type:%s, handle time"
+ " < 0, expires-ts:%llu, current-ts:%llu, bug=%d.\n",
+ timer, timer_type_string[type],
+ timer_info->handler_entry_ts, timestamp,
+ timer_info->bug);
+ return;
+ }
+
+ timer_info->handler_total_time += delta;
+ if (timer_info->handler_max_time < (u64)delta) {
+ timer_info->handler_max_time = delta;
+ timer_info->handler_max_time_at_ts = timestamp;
+ timer_info->handler_max_time_at_func = timer_info->function;
+ }
+
+ return;
+}
+
+static void
+timer_start_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp __used, struct thread *thread __used)
+{
+ return timer_hrtimer_start(TIMER, data, event);
+}
+
+static void
+timer_expire_entry_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp, struct thread *thread)
+{
+ return timer_hrtimer_expires_entry(TIMER, data, event, timestamp, thread);
+}
+
+static void
+timer_expire_exit_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp, struct thread *thread)
+{
+ return timer_hrtimer_expires_exit(TIMER, data, event, timestamp, thread);
+}
+
+static void
+hrtimer_start_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp __used, struct thread *thread __used)
+{
+ return timer_hrtimer_start(HRTIMER, data, event);
+}
+
+static void
+hrtimer_expire_entry_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp, struct thread *thread)
+{
+ return timer_hrtimer_expires_entry(HRTIMER, data, event, timestamp, thread);
+}
+
+static void
+hrtimer_expire_exit_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp, struct thread *thread)
+{
+ return timer_hrtimer_expires_exit(HRTIMER, data, event, timestamp, thread);
+}
+
+static void
+itimer_state_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp __used, struct thread *thread)
+{
+ u64 value_sec, value_usec, expires;
+ struct timer_info *timer_info;
+ void *timer = NULL;
+ int which;
+
+ FILLL_RAW_FIELD_VALUE(event, value_sec, data);
+ FILLL_RAW_FIELD_VALUE(event, value_usec, data);
+ FILLL_RAW_FIELD_VALUE(event, expires, data);
+ FILLL_RAW_FIELD_VALUE(event, which, data);
+ FILL_RAM_FIELD_PTR(event, timer, data);
+
+ timer_info = timer_findnew(thread, ITIMER, which);
+
+ /* itimer canceled, we skip this event */
+ if (!value_sec && !value_usec)
+ return ;
+
+ dprintf("timer-start: Timer[%p], type:%s, task:%s, expires:%llu",
+ timer_info->timer, "itimer", thread->comm, expires);
+
+ /* itimer started */
+ timer_info->timer_expire_ts = expires;
+ timer_check_set_state(timer_info, TIMER_START);
+ if (which == ITIMER_REAL) {
+ timer = timer_findnew(timer, HRTIMER, MAX_ITIMER_TYPE);
+ timer_info->itimer_hrtimer = timer;
+ dprintf(" hrtimer:%p\n", timer);
+ } else {
+ dprintf("\n");
+ }
+}
+
+static void
+itimer_expire_handler(void *data, struct event *event, int this_cpu __used,
+ u64 timestamp, struct thread *thread)
+{
+ int which;
+ u64 now;
+ pid_t pid;
+ struct thread *itimer_thread;
+ struct timer_info *timer_info;
+ s64 delta = 0;
+ int ret;
+
+ FILLL_RAW_FIELD_VALUE(event, which, data);
+ FILLL_RAW_FIELD_VALUE(event, now, data);
+ FILLL_RAW_FIELD_VALUE(event, pid, data);
+
+ itimer_thread = perf_session__findnew(timer_session, pid);
+ timer_info = timer_findnew(itimer_thread, ITIMER, which);
+
+ dprintf("timer-expires-entry: Timer[%p], type:%s, ts:%llu\n",
+ timer_info->timer, "itimer", timestamp);
+
+ ret = timer_check_set_state(timer_info, TIMER_EXPIRE_ENTRY);
+ if (ret == 0)
+ return;
+
+ if (which == ITIMER_REAL) {
+ if (timer_info->itimer_hrtimer->state == TIMER_INIT)
+ return;
+ delta = timer_info->itimer_hrtimer->timer_last_lat;
+ } else
+ delta = now - timer_info->timer_expire_ts;
+
+ if (delta < 0) {
+ timer_info->bug++;
+ dprintf("timer-expires-entry: Timer[%p], type:%s, handle time < 0,"
+ "bug=%d.\n",
+ timer_info->timer, "itimer", timer_info->bug);
+ return;
+ }
+
+ timer_info->expire_nr++;
+ timer_info->timer_total_lat += delta;
+ if (timer_info->timer_max_lat <= (u64)delta) {
+ timer_info->timer_max_lat = delta;
+ timer_info->timer_max_lat_at_ts = timestamp;
+ timer_info->timer_max_lat_at_thread = thread;
+ }
+
+ return;
+}
+
+struct event_handler_unit {
+ const char *event_name;
+ void (*handler) (void *data, struct event *event, int this_cpu,
+ u64 timestamp, struct thread *thread);
+} timer_handler[] = {
+ /* timer */
+ { "timer_start", timer_start_handler },
+ { "timer_expire_entry", timer_expire_entry_handler },
+ { "timer_expire_exit", timer_expire_exit_handler },
+
+ /* hrtimer */
+ { "hrtimer_start", hrtimer_start_handler },
+ { "hrtimer_expire_entry", hrtimer_expire_entry_handler },
+ { "hrtimer_expire_exit", hrtimer_expire_exit_handler},
+
+ /* itimer */
+ { "itimer_state", itimer_state_handler },
+ { "itimer_expire", itimer_expire_handler },
+};
+
+static void
+process_raw_event(event_t *raw_event __used, struct perf_session *session __used,
+ void *data, int cpu, u64 timestamp, struct thread *thread)
+{
+ struct event *event;
+ int type;
+ int i;
+ int handler_item = ARRAY_SIZE(timer_handler);
+
+ type = trace_parse_common_type(data);
+ event = trace_find_event(type);
+
+ for (i = 0; i < handler_item; i++)
+ if (!strcmp(timer_handler[i].event_name, event->name))
+ timer_handler[i].handler(data, event, cpu, timestamp, thread);
+}
+
+static int process_sample_event(event_t *event, struct perf_session *session)
+{
+ struct sample_data data;
+ struct thread *thread;
+
+ if (!(session->sample_type & PERF_SAMPLE_RAW))
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.time = -1;
+ data.cpu = -1;
+ data.period = -1;
+
+ event__parse_sample(event, session->sample_type, &data);
+
+ dump_printf("(IP, %d): %d/%d: %p period: %Ld\n",
+ event->header.misc,
+ data.pid, data.tid,
+ (void *)(long)data.ip,
+ (long long)data.period);
+
+ thread = perf_session__findnew(session, data.pid);
+ if (thread == NULL) {
+ pr_debug("problem processing %d event, skipping it.\n",
+ event->header.type);
+ return -1;
+ }
+
+ dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
+
+ if (profile_cpu != -1 && profile_cpu != (int)data.cpu)
+ return 0;
+
+ process_raw_event(event, session, data.raw_data, data.cpu, data.time, thread);
+ return 0;
+}
+
+static int sample_type_check(struct perf_session *session)
+{
+ if (!(session->sample_type & PERF_SAMPLE_RAW)) {
+ fprintf(stderr,
+ "No trace sample to read. Did you call perf record "
+ "without -R?");
+ return -1;
+ }
+
+ return 0;
+}
+
+static struct perf_event_ops event_ops = {
+ .process_sample_event = process_sample_event,
+ .process_comm_event = event__process_comm,
+ .sample_type_check = sample_type_check,
+};
+
+static int read_events(void)
+{
+ int err;
+ timer_session = perf_session__new(input_name, O_RDONLY, 0, NULL);
+
+ if (timer_session == NULL)
+ return -ENOMEM;
+
+ err = perf_session__process_events(timer_session, &event_ops);
+ perf_session__delete(timer_session);
+ return err;
+}
+
+static inline const char *get_itimer_type(int which)
+{
+ switch (which) {
+ case ITIMER_REAL:
+ return "REAL";
+ case ITIMER_VIRTUAL:
+ return "VIRTUAL";
+ case ITIMER_PROF:
+ return "PROF";
+ default:
+ return "Unknow";
+ }
+}
+
+static inline const char *get_lat_unit(struct timer_info *timer_info)
+{
+ switch (timer_info->type) {
+ case TIMER:
+ return "HZ";
+ case HRTIMER:
+ return "ns";
+ case ITIMER:
+ if (timer_info->itimer_type == ITIMER_REAL)
+ return "ns";
+ return "HZ";
+ default:
+ return "ERR";
+ }
+}
+
+static void __print_timer_lat(struct timer_info *timer_info)
+{
+ bug_nr += timer_info->bug;
+
+ if (timer_info->type == ITIMER) {
+ printf("|%-10.10s:[%7s]", ((struct thread *)(timer_info->timer))->comm,
+ get_itimer_type(timer_info->itimer_type));
+ } else
+ printf("|%-20p", timer_info->timer);
+
+ printf("|%-10s", timer_type_string[timer_info->type]);
+ printf("|%-12.3f %3s", timer_info->timer_avg_lat, get_lat_unit(timer_info));
+ printf("|%-10llu %3s", timer_info->timer_max_lat, get_lat_unit(timer_info));
+ printf("|%-20llu", timer_info->timer_max_lat_at_ts);
+ printf("|%-16.16s|\n", timer_info->timer_max_lat_at_thread->comm);
+}
+
+static void print_timer_lat(void)
+{
+ printf("\n");
+ printf("-------------------------------------------------------------------------------------------------------\n");
+ printf("| Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |\n");
+ timer_traversal(&timer_result, __print_timer_lat);
+}
+
+static void __print_handle_time(struct timer_info *timer_info)
+{
+ void *function;
+
+ bug_nr += timer_info->bug;
+
+ if (timer_info->type == ITIMER)
+ return;
+
+ function = timer_info->handler_max_time_at_func;
+
+ printf("|%-20p", timer_info->timer);
+ printf("|%-10s", timer_type_string[timer_info->type]);
+ printf("|%-16.3f", timer_info->handler_avg_time / 1e6);
+ printf("|%-14.3f", (double)timer_info->handler_max_time / 1e6);
+ printf("|%-20.3f", (double)timer_info->handler_max_time_at_ts / 1e9);
+ if (!function) {
+ printf("|%-16s|\n", "Not - Catch");
+ return;
+ }
+
+ printf("|%-16p|\n", function);
+}
+
+static void print_handle_time(void)
+{
+ printf("\n");
+ printf("-------------------------------------------------------------------------------------------------------\n");
+ printf("| Timer | TYPE | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |\n");
+ timer_traversal(&timer_result, __print_handle_time);
+}
+
+static void __print_bug_timer(struct timer_info *timer_info)
+{
+ if (!timer_info->bug)
+ return;
+
+ if (timer_info->type == ITIMER) {
+ printf("|%-10.10s:[%7s]", ((struct thread *)(timer_info->timer))->comm,
+ get_itimer_type(timer_info->itimer_type));
+ } else
+ printf("|%-20p", timer_info->timer);
+
+ printf("|%-10s", timer_type_string[timer_info->type]);
+ printf("|%-10d|\n", timer_info->bug);
+}
+
+static void print_bug_timer(void)
+{
+ if (bug_nr) {
+ printf("\nWarning: detect %d bug(s):\n", bug_nr);
+ printf("--------------------------------------------\n");
+ printf("| Timer | TYPE | Bug |\n");
+ timer_traversal(&timer_result, __print_bug_timer);
+ }
+}
+
+static void print_result(void)
+{
+ if (!print_lat && !print_handle)
+ print_lat = 1;
+
+ if (print_lat)
+ print_timer_lat();
+
+ if (print_handle) {
+ bug_nr = 0;
+ print_handle_time();
+ }
+
+ print_bug_timer();
+}
+
+static void __cmd_lat(void)
+{
+ setup_pager();
+ read_events();
+ sort_result();
+ print_result();
+}
+
+static int __cmd_record(int argc, const char **argv)
+{
+ unsigned int rec_argc, i, j;
+ const char **rec_argv;
+
+ rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+ for (i = 0; i < ARRAY_SIZE(record_args); i++)
+ rec_argv[i] = strdup(record_args[i]);
+
+ for (j = 1; j < (unsigned int)argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ BUG_ON(i != rec_argc);
+
+ return cmd_record(i, rec_argv, NULL);
+}
+
+int cmd_timer(int argc, const char **argv, const char *prefix __used)
+{
+ argc = parse_options(argc, argv, timer_options, timer_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(timer_usage, timer_options);
+
+ symbol__init(0);
+ if (!strncmp(argv[0], "rec", 3)) {
+ return __cmd_record(argc, argv);
+ } else if (!strncmp(argv[0], "lat", 3)) {
+ if (argc > 1) {
+ argc = parse_options(argc, argv, latency_options, latency_usage, 0);
+ if (argc)
+ usage_with_options(latency_usage, latency_options);
+ }
+ setup_sorting();
+ __cmd_lat();
+ } else {
+ usage_with_options(timer_usage, timer_options);
+ }
+
+ return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index a3d8bf6..afbe9b1 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,5 +29,6 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_version(int argc, const char **argv, const char *prefix);
extern int cmd_probe(int argc, const char **argv, const char *prefix);
extern int cmd_kmem(int argc, const char **argv, const char *prefix);
+extern int cmd_timer(int argc, const char **argv, const char *prefix);

#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 02b09ea..d49bc55 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -15,3 +15,4 @@ perf-top mainporcelain common
perf-trace mainporcelain common
perf-probe mainporcelain common
perf-kmem mainporcelain common
+perf-timer mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index cf64049..b43268a 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -300,6 +300,7 @@ static void handle_internal_command(int argc, const char **argv)
{ "sched", cmd_sched, 0 },
{ "probe", cmd_probe, 0 },
{ "kmem", cmd_kmem, 0 },
+ { "timer", cmd_timer, 0 },
};
unsigned int i;
static const char ext[] = STRIP_EXTENSION;
--
1.6.1.2

2009-12-15 13:59:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf_event: fix getting point

On Tue, Dec 15, 2009 at 07:20:26PM +0800, Xiao Guangrong wrote:
> The point that got by raw_field_ptr() is point to @data offset not
> tracepoint saved.
>
> Introduce raw_field_arry() to get arry data is tracepoint save
> arry to it.


Could you please rename it into raw_field_array()?
I feel like...a kind of loss there :)



>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> tools/perf/builtin-sched.c | 3 +--
> tools/perf/util/trace-event-parse.c | 25 ++++++++++++++++++++++---
> tools/perf/util/trace-event.h | 3 +++
> 3 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> index d67f274..8df982e 100644
> --- a/tools/perf/builtin-sched.c
> +++ b/tools/perf/builtin-sched.c
> @@ -630,8 +630,7 @@ static void test_calibrations(void)
>
> #define FILL_ARRAY(ptr, array, event, data) \
> do { \
> - void *__array = raw_field_ptr(event, #array, data); \
> - memcpy(ptr.array, __array, sizeof(ptr.array)); \
> + raw_field_arry(event, #array, data, ptr.array, sizeof(ptr.array));\
> } while(0)
>
> #define FILL_COMMON_FIELDS(ptr, event, data) \
> diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
> index c5c32be..b0ee64f 100644
> --- a/tools/perf/util/trace-event-parse.c
> +++ b/tools/perf/util/trace-event-parse.c
> @@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
> field->flags |= FIELD_IS_ARRAY;
>
> type = read_token(&token);
> - while (strcmp(token, "]") != 0) {
> + while (strcmp(token, "]") != 0) {
> if (last_type == EVENT_ITEM &&
> type == EVENT_ITEM)
> len = 2;
> @@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
>
> void *raw_field_ptr(struct event *event, const char *name, void *data)
> {
> + void *ptr;
> + unsigned long long value;
> +
> + value = raw_field_value(event, name, data);
> +
> + if (!value)
> + return NULL;
> +
> + memcpy(&ptr, &value, sizeof(ptr));



Could you perhaps just do

ptr = (void *)value; ?

This memcpy suggest we are dereferencing some field while
actually just copy a pointer.

Thanks!

2009-12-15 14:15:43

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior

On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
> Hi,
>
> We introduce 'perf timer' in this patchset, it can analyze timer
> latency and timer function handle time, the usage and result is
> like below:
>
> # perf timer record
> # perf timer lat --print-lat --print-handle
> -------------------------------------------------------------------------------------------------------
> | Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
> |0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
> |0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
> |0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
> |main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
> |main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |



Cool! This is really a good work and a good idea.

Just have some neats in mind. hrtimers and timers don't have the same latency and
granularity requirements.

As you show it, timers have an HZ granularity and hrtimers are about nanoseconds,
and mixing them up in the same array of latency report is too messy.
They don't have the same granularity/latency scope so they should
be reported separately.

Otherwise it's like mixing slothes and flies latency results after
a fly swat attack test ;-)

2009-12-15 14:23:35

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior

On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
> Hi,
>
> We introduce 'perf timer' in this patchset, it can analyze timer
> latency and timer function handle time, the usage and result is
> like below:
>
> # perf timer record
> # perf timer lat --print-lat --print-handle
> -------------------------------------------------------------------------------------------------------
> | Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
> |0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
> |0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
> |0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
> |main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
> |main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |
>
> ......
>
> -------------------------------------------------------------------------------------------------------
> | Timer | TYPE | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
> |0xf7ad1f5c |hrtimer |0.025 |0.025 |10270.129 |0xc016b5b0 |
> |0xf7903f04 |timer |0.009 |0.011 |10260.342 |0xc0159240 |
> |0xf787a05c |hrtimer |0.031 |0.062 |10267.018 |0xc014cc40 |
>
> And, in current code, it'll complain with below message when we use
> 'perf timer lat':
>
> # ./perf timer lat
> Warning: unknown op '{'
> Warning: Error: expected type 5 but read 1
> Warning: failed to read event print fmt for hrtimer_start
> Warning: unknown op '{'
> Warning: Error: expected type 5 but read 1
> Warning: failed to read event print fmt for hrtimer_expire_entry


Oh and indeed we have some event format complexities to solve
there.

cat /debug/tracing/events/timer/hrtimer_start/format

print fmt: "hrtimer=%p function=%pf expires=%llu softexpires=%llu", REC->hrtimer, REC->function, (unsigned long
long)(((ktime_t) { .tv64 = REC->expires }).tv64), (unsigned long long)(((ktime_t) { .tv64 = REC->softexpires
}).tv64)

We should try to simplify this, may be we should just expose
__entry->expires as is.

2009-12-15 17:45:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf/timer: 'perf timer' core code


On Tue, 15 Dec 2009, Xiao Guangrong wrote:

> It's the core code of 'perf timer', we can use it to analyse timer's
> behavior:
> # perf timer record
> # perf timer lat --print-lat --print-handle
> -------------------------------------------------------------------------------------------------------
> | Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
> |0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
> |0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
> |0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
> |main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
> |main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |
>
> ......
>
> -------------------------------------------------------------------------------------------------------
> | Timer | TYPE | Avg-handle (ms)|Max-handle(ms)| Max-handle-at-TS(s)|Max-lat-at-func |
> |0xf7ad1f5c |hrtimer |0.025 |0.025 |10270.129 |0xc016b5b0 |
> |0xf7903f04 |timer |0.009 |0.011 |10260.342 |0xc0159240 |
> |0xf787a05c |hrtimer |0.031 |0.062 |10267.018 |0xc014cc40 |

The output is confusing in several aspects:

1) Different time units:

Please use consistent time units for everything. micro seconds
are fine and we definitely do not care about nanosecond
fractions.

2) Timer description:

Why do we have hex addresses and process names intermingled ? Why
don't we print the process/thread name which owns the timer
always ? [PROF/VIRTUAL] is not a property of the Timer, it
belongs into type.

3) Max-lat-at-Task:

What does this field tell ?

4) *handle:

That should be a more descriptive name, e.g. function runtime

5) Max-lat-at-func:

Is this the callback function which ran the longest time ? Why
is it named latency ? Why is it not decoded into a symbol ?

Btw, fitting the output into 80chars allows to use the tool on a non
graphical terminal as well.

Also there are other metrics of timers which are interesting and
should be retrieved by such a tool:

number of activated timers
number of canceled timers
number of expired timers

in the form of simple statistics.

The canceled timers are interesting also in a list, so we can see
which timers are canceled after which time and how long before the
expiry.

> +#ifdef TDEBUG
> +static char const *timer_state_string[] = {
> + "TIMER_INIT",
> + "TIMER_STARE",

If I _stare_ longer, I might understand it :)

> +#define SORT_KEY \
> + "timer, type, avg-timer-latency, max-timer-latency," \
> + " avg-handle-time, max-handle-time"

The code has at least two places where these strings are used. They
should be synced.

> +static char const *input_name = "perf.data";
> +static char default_sort_order[] = SORT_KEY;
> +
> +static char *sort_order = default_sort_order;
> +static int profile_cpu = -1;
> +static int print_lat;
> +static int print_handle;
> +static int bug_nr;
> +static struct perf_session *timer_session;
> +
> +static const struct option timer_options[] = {
> + OPT_STRING('i', "input", &input_name, "file",
> + "input file name"),
> + OPT_BOOLEAN('v', "verbose", &verbose,
> + "be more verbose (show symbol address, etc)"),
> + OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
> + "dump raw trace in ASCII"),
> + OPT_END()
> +};
> +
> +static const char * const timer_usage[] = {
> + "perf timer [<options>] {record|latency}",
> + NULL
> +};

Your example above uses "perf timer lat". What's correct ?

> +static const struct option latency_options[] = {
> + OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
> + "sort by key(s): "SORT_KEY),

Do we really need a sort order ? A single sort key should be
sufficient.

I'd prefer to have a selector instead which lets me filter timer
types. If I debug hrtimers then I have no interest in itimers or timer
list timers.

> +enum timer_type {
> + TIMER,
> + HRTIMER,
> + ITIMER,
> +};

Hmm. I don't think that's correct. You should split the itimer type
into the various subimplementations.

> +static char const *timer_type_string[] = {
> + "timer",
> + "hrtimer",
> + "itimer",
> +};

Ditto.

> +struct sort_dimension {
> + const char *name;
> + sort_fn_t cmp;
> + struct list_head list;
> +};
> +
> +#define CMP_FUNCTION(value) \

Please use something like "field". value confused me for a while, as I
expected a real value.

> +static int cmp_##value(struct timer_info *v1, struct timer_info *v2) \
> +{ \
> + if (v1->value < v2->value) \
> + return -1; \
> + if (v1->value > v2->value) \
> + return 1; \
> + return 0; \
> +}
> +
> +#define DECLARE_SORT_DIMENSION(sort_name, field) \

That should be DEFINE_... You are defining the function, not declaring
it.

> +CMP_FUNCTION(field) \
> +static struct sort_dimension sort_##field = { \
> + .name = sort_name, \
> + .cmp = cmp_##field, \
> +}
> +
> +DECLARE_SORT_DIMENSION("timer", timer);
> +DECLARE_SORT_DIMENSION("type", type);
> +DECLARE_SORT_DIMENSION("avg-timer-latency", timer_avg_lat);
> +DECLARE_SORT_DIMENSION("max-timer-latency", timer_max_lat);
> +DECLARE_SORT_DIMENSION("avg-handle-time", handler_avg_time);
> +DECLARE_SORT_DIMENSION("max-handle-time", handler_max_time);
> +
> +static struct sort_dimension *available_sorts[] = {
> + &sort_timer,
> + &sort_type,
> + &sort_timer_avg_lat,
> + &sort_timer_max_lat,
> + &sort_handler_avg_time,
> + &sort_handler_max_time,
> +};

This indirection horrible.

#define SORT(field,_name) \
{ .func = cmp_##field, \
.name = _name, }

static struct sort_key sorts[] = {
SORT(timer, "timer"),
....
};

> +#define NB_AVAILABLE_SORTS (int)(sizeof(available_sorts) / sizeof(struct sort_dimension *))

Please use NR_ as abbreviation for number. Also the size calculation
for arrays is:

ARRAY_SIZE(available_sorts)

> +static int sort_dimension__add(const char *tok, struct list_head *list)

No double underscores please.

> +{
> + int i;
> +
> + for (i = 0; i < NB_AVAILABLE_SORTS; i++) {
> + if (!strcmp(available_sorts[i]->name, tok)) {
> + list_add_tail(&available_sorts[i]->list, list);
> +
> + return 0;
> + }
> + }
> +
> + return -1;
> +}
> +
> +static LIST_HEAD(sort_list);
> +
> +static void setup_sorting(void)
> +{
> + char *tmp, *tok, *str = strdup(sort_order);

Please hand sort_order in as an argument.

> + for (tok = strtok_r(str, ", ", &tmp);
> + tok; tok = strtok_r(NULL, ", ", &tmp)) {
> + if (sort_dimension__add(tok, &sort_list) < 0) {
> + error("Unknown --sort key: `%s'", tok);
> + usage_with_options(latency_usage, latency_options);
> + }
> + }
> +
> + free(str);
> +
> + sort_dimension__add("timer", &default_cmp);
> + sort_dimension__add("itimer-type", &default_cmp);

That obviously does not match any of the sort types.

> +}
> +
> +static struct rb_root timer_root;
> +
> +static int timer_key_cmp(struct list_head *list, struct timer_info *l,
> + struct timer_info *r)
> +{
> + struct sort_dimension *sort;
> + int ret = 0;
> +
> + BUG_ON(list_empty(list));
> +
> + list_for_each_entry(sort, list, list) {
> + ret = sort->cmp(l, r);
> + if (ret)
> + return ret;
> + }
> +
> + return ret;
> +}
> +
> +static void timer_traversal(struct rb_root *root,
> + void (*fun)(struct timer_info *))
> +{
> + struct rb_node *next;
> +
> + next = rb_first(root);
> +
> + while (next) {
> + struct timer_info *timer;
> +
> + timer = rb_entry(next, struct timer_info, node);
> + fun(timer);
> + next = rb_next(next);
> + }
> +}
> +
> +static struct timer_info *
> +__timer_search(struct rb_root *root, struct timer_info *timer,
> + struct list_head *_sort_list)
> +{
> + struct rb_node *node = root->rb_node;
> +
> + while (node) {
> + struct timer_info *timer_info;
> + int cmp;
> +
> + timer_info = container_of(node, struct timer_info, node);
> +
> + cmp = timer_key_cmp(_sort_list, timer, timer_info);
> + if (cmp > 0)
> + node = node->rb_left;
> + else if (cmp < 0)
> + node = node->rb_right;
> + else
> + return timer_info;

This looks more than odd. You search for a timer in the list by
using the compare functions which are used for sorting ?

How should that work ?

> + }
> + return NULL;
> +}
> +
> +static struct timer_info *
> +timer_search(struct timer_info *timer)
> +{
> + return __timer_search(&timer_root, timer, &default_cmp);
> +}
> +
> +static void
> +__timer_insert(struct rb_root *root, struct timer_info *data,
> + struct list_head *_sort_list)
> +{
> + struct rb_node **new = &(root->rb_node), *parent = NULL;
> +
> + while (*new) {
> + struct timer_info *this;
> + int cmp;
> +
> + this = container_of(*new, struct timer_info, node);
> + parent = *new;
> +
> + cmp = timer_key_cmp(_sort_list, data, this);
> +
> + if (cmp > 0)
> + new = &((*new)->rb_left);
> + else
> + new = &((*new)->rb_right);
> + }
> +
> + rb_link_node(&data->node, parent, new);
> + rb_insert_color(&data->node, root);
> +}
> +
> +static void timer_insert(struct timer_info *timer_info)
> +{
> + dprintf("timer %p insert...\n", timer_info->timer);
> + __timer_insert(&timer_root, timer_info, &default_cmp);
> +}
> +
> +static struct timer_info
> +*__timer_find(void *timer, enum timer_type type, int itimer_type)

As I said before, that's wrong. Enumerate the itimer subtypes in the
timer types and you get rid of all that extra itimer_type business.

> +{
> + struct timer_info *find = NULL;
> + struct timer_info timer_info = {
> + .timer = timer,
> + .itimer_type = itimer_type,
> + };
> +
> + find = timer_search(&timer_info);
> + if (find && find->type != type) {
> +
> + dprintf("find timer[%p], but type[%s] is not we expect[%s],"
> + "set to initializtion state.\n", find->timer,
> + timer_type_string[find->type], timer_type_string[type]);
> +
> + find->type = type;
> + find->bug++;
> + find->state = TIMER_INIT;

Why does a timer_search fail ? And why is fixing up the type if it
is not matching a good idea ?

> + }
> +
> + dprintf("find timer[%p] %s\n", timer, find ? "success" : "fail");
> + return find;
> +}
> +
> +static struct timer_info *
> +timer_findnew(void *timer, enum timer_type type, int itimer_type)
> +{
> + struct timer_info *find;
> +
> + find = __timer_find(timer, type, itimer_type);
> + if (find)
> + return find;
> +
> + find = zalloc(sizeof(*find));
> + if (!find)
> + die("No memory");
> +
> + find->type = type;
> + find->timer = timer;
> + find->itimer_type = itimer_type;
> + timer_insert(find);
> +
> + return find;
> +}
> +
> +static struct rb_root timer_result;
> +
> +static void sort_result(void)
> +{
> + struct rb_node *node;
> +
> + for (;;) {
> + struct timer_info *data;
> + node = rb_first(&timer_root);
> + if (!node)
> + break;
> +
> + rb_erase(node, &timer_root);
> + data = rb_entry(node, struct timer_info, node);
> + data->timer_avg_lat = (double)data->timer_total_lat /
> + (double)data->expire_nr;
> + data->handler_avg_time = (double)data->handler_total_time /
> + (double)data->expire_nr;
> + if (data->expire_nr)
> + __timer_insert(&timer_result, data, &sort_list);
> + }
> +}
> +
> +static int timer_check_set_state(struct timer_info *timer_info,
> + enum timer_state new_state)
> +{
> + enum timer_state current_state = timer_info->state;
> + int ret = 1;
> +
> + /*
> + * if current state is TIMER_INIT, it means it's the first time we
> + * catched, no need check.

Check whatfor ?

> + */
> + if (current_state == TIMER_INIT) {
> + dprintf("timer[%p] initialize, set state to %s\n",
> + timer_info->timer, timer_state_string[new_state]);
> + ret = 0;
> + }
> +
> + timer_info->state = new_state;
> + return ret;
> +}
> +
> +#define FILL_RAM_FIELD_PTR(event, field, data) \
> + field = raw_field_ptr(event, #field, data)
> +#define FILLL_RAW_FIELD_VALUE(event, field, data) \
> + field = (typeof(field))raw_field_value(event, #field, data)

I can neither figure out what FILL_RAM_FIELD means nor why
FILLL_RAW_FIELD needs 3 L.

> +static void *get_timer(enum timer_type type, struct event *event, void *data)
> +{
> + if (type == HRTIMER) {
> + void *hrtimer = NULL;
> +
> + FILL_RAM_FIELD_PTR(event, hrtimer, data);
> + return hrtimer;

Shudder.

return raw_field_ptr(event, "hrtimer", data);

> + } else {
> + void *timer = NULL;
> +
> + FILL_RAM_FIELD_PTR(event, timer, data);
> + return timer;
> + }
> +}
> +
> +static void
> +timer_hrtimer_start(enum timer_type type, void *data, struct event *event)
> +{
> + void *timer, *function = NULL;
> + u64 expires;
> + struct timer_info *timer_info;
> +
> + timer = get_timer(type, event, data);
> +
> + FILL_RAM_FIELD_PTR(event, function, data);
> + FILLL_RAW_FIELD_VALUE(event, expires, data);
> +
> + dprintf("timer-start: Timer[%p], type:%s, function:%p, expires:%llu\n",
> + timer, timer_type_string[type], function, expires);
> +
> + timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
> + timer_check_set_state(timer_info, TIMER_START);
> + timer_info->function = function;
> + timer_info->timer_expire_ts = expires;
> +}
> +
> +static void
> +timer_hrtimer_expires_entry(enum timer_type type, void *data,
> + struct event *event, u64 timestamp,
> + struct thread *thread)
> +{
> + void *timer;
> + int ret;
> + u64 now;
> + s64 delta;
> + struct timer_info *timer_info;
> +
> + FILLL_RAW_FIELD_VALUE(event, now, data);
> + timer = get_timer(type, event, data);
> + timer_info = timer_findnew(timer, type, MAX_ITIMER_TYPE);
> + timer_info->handler_entry_ts = timestamp;
> + ret = timer_check_set_state(timer_info, TIMER_EXPIRE_ENTRY);
> +
> + dprintf("timer-expires-entry: Timer[%p], type:%s, now:%llu, ts:%llu\n",
> + timer, timer_type_string[type], now, timestamp);
> + if (ret == 0)
> + return;
> +
> + delta = (s64)(now - timer_info->timer_expire_ts);
> + if (delta < 0) {
> + timer_info->bug++;
> + dprintf("timer-expires-entry: Timer[%p], type:%s,"
> + " timer latency < 0, bug=%d.\n", timer,
> + timer_type_string[type], timer_info->bug);
> + return ;
> + }
> +
> + timer_info->expire_nr++;
> + timer_info->timer_last_lat = delta;
> + timer_info->timer_total_lat += delta;
> + if (timer_info->timer_max_lat <= (u64)delta) {

Why can't we just make timer_max_lat s64 so we get rid of those
silly typecasts ?

> + timer_info->timer_max_lat = delta;
> + timer_info->timer_max_lat_at_ts = timestamp;
> + timer_info->timer_max_lat_at_thread = thread;

What does us the information in which thread context the latency
happend show ? Actually nothing. All it tells us that thread X was
on the CPU at that point. That's useless.

> +static void
> +itimer_state_handler(void *data, struct event *event, int this_cpu __used,
> + u64 timestamp __used, struct thread *thread)
> +{
> + u64 value_sec, value_usec, expires;
> + struct timer_info *timer_info;
> + void *timer = NULL;
> + int which;
> +
> + FILLL_RAW_FIELD_VALUE(event, value_sec, data);
> + FILLL_RAW_FIELD_VALUE(event, value_usec, data);
> + FILLL_RAW_FIELD_VALUE(event, expires, data);
> + FILLL_RAW_FIELD_VALUE(event, which, data);
> + FILL_RAM_FIELD_PTR(event, timer, data);

This is complete obfuscated, while

value_sec = get_value(data, event, "value_sec");

is obvious.

> + timer_info = timer_findnew(thread, ITIMER, which);
> +
> + /* itimer canceled, we skip this event */
> + if (!value_sec && !value_usec)
> + return ;

You throw away valuable information here about canceled timers.

> + dprintf("timer-start: Timer[%p], type:%s, task:%s, expires:%llu",
> + timer_info->timer, "itimer", thread->comm, expires);
> +
> + /* itimer started */
> + timer_info->timer_expire_ts = expires;
> + timer_check_set_state(timer_info, TIMER_START);
> + if (which == ITIMER_REAL) {
> + timer = timer_findnew(timer, HRTIMER, MAX_ITIMER_TYPE);
> + timer_info->itimer_hrtimer = timer;
> + dprintf(" hrtimer:%p\n", timer);
> + } else {
> + dprintf("\n");
> + }
> +}
> +
> +static void
> +itimer_expire_handler(void *data, struct event *event, int this_cpu __used,
> + u64 timestamp, struct thread *thread)
> +{
> + int which;
> + u64 now;
> + pid_t pid;
> + struct thread *itimer_thread;
> + struct timer_info *timer_info;
> + s64 delta = 0;

No need to initialize that.

> + int ret;

Please group identical types

int ret, which;

> +struct event_handler_unit {
> + const char *event_name;
> + void (*handler) (void *data, struct event *event, int this_cpu,
> + u64 timestamp, struct thread *thread);
> +} timer_handler[] = {
> + /* timer */
> + { "timer_start", timer_start_handler },
> + { "timer_expire_entry", timer_expire_entry_handler },
> + { "timer_expire_exit", timer_expire_exit_handler },
> +
> + /* hrtimer */
> + { "hrtimer_start", hrtimer_start_handler },
> + { "hrtimer_expire_entry", hrtimer_expire_entry_handler },
> + { "hrtimer_expire_exit", hrtimer_expire_exit_handler},
> +
> + /* itimer */
> + { "itimer_state", itimer_state_handler },
> + { "itimer_expire", itimer_expire_handler },
> +};
> +
> +static void
> +process_raw_event(event_t *raw_event __used, struct perf_session *session __used,
> + void *data, int cpu, u64 timestamp, struct thread *thread)
> +{
> + struct event *event;
> + int type;
> + int i;
> + int handler_item = ARRAY_SIZE(timer_handler);

See above.

> + type = trace_parse_common_type(data);
> + event = trace_find_event(type);
> +
> + for (i = 0; i < handler_item; i++)
> + if (!strcmp(timer_handler[i].event_name, event->name))
> + timer_handler[i].handler(data, event, cpu, timestamp, thread);
> +}
> +
> +static int read_events(void)
> +{
> + int err;

New line between variables and code.

> + timer_session = perf_session__new(input_name, O_RDONLY, 0, NULL);
> +
> + if (timer_session == NULL)

if (!timer_session)

> + return -ENOMEM;
> +
> + err = perf_session__process_events(timer_session, &event_ops);
> + perf_session__delete(timer_session);
> + return err;
> +}
> +

Thanks,

tglx

2009-12-16 01:05:39

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf_event: fix getting point



Frederic Weisbecker wrote:
> On Tue, Dec 15, 2009 at 07:20:26PM +0800, Xiao Guangrong wrote:
>> The point that got by raw_field_ptr() is point to @data offset not
>> tracepoint saved.
>>
>> Introduce raw_field_arry() to get arry data is tracepoint save
>> arry to it.
>
>
> Could you please rename it into raw_field_array()?
> I feel like...a kind of loss there :)
>

Ah, it's my typo, i will fix it.

>
>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> tools/perf/builtin-sched.c | 3 +--
>> tools/perf/util/trace-event-parse.c | 25 ++++++++++++++++++++++---
>> tools/perf/util/trace-event.h | 3 +++
>> 3 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
>> index d67f274..8df982e 100644
>> --- a/tools/perf/builtin-sched.c
>> +++ b/tools/perf/builtin-sched.c
>> @@ -630,8 +630,7 @@ static void test_calibrations(void)
>>
>> #define FILL_ARRAY(ptr, array, event, data) \
>> do { \
>> - void *__array = raw_field_ptr(event, #array, data); \
>> - memcpy(ptr.array, __array, sizeof(ptr.array)); \
>> + raw_field_arry(event, #array, data, ptr.array, sizeof(ptr.array));\
>> } while(0)
>>
>> #define FILL_COMMON_FIELDS(ptr, event, data) \
>> diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
>> index c5c32be..b0ee64f 100644
>> --- a/tools/perf/util/trace-event-parse.c
>> +++ b/tools/perf/util/trace-event-parse.c
>> @@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
>> field->flags |= FIELD_IS_ARRAY;
>>
>> type = read_token(&token);
>> - while (strcmp(token, "]") != 0) {
>> + while (strcmp(token, "]") != 0) {
>> if (last_type == EVENT_ITEM &&
>> type == EVENT_ITEM)
>> len = 2;
>> @@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
>>
>> void *raw_field_ptr(struct event *event, const char *name, void *data)
>> {
>> + void *ptr;
>> + unsigned long long value;
>> +
>> + value = raw_field_value(event, name, data);
>> +
>> + if (!value)
>> + return NULL;
>> +
>> + memcpy(&ptr, &value, sizeof(ptr));
>
>
>
> Could you perhaps just do
>
> ptr = (void *)value; ?
>

We can't do it in x86_32:
error: cast to pointer from integer of different size

Thanks,
Xiao

2009-12-16 01:21:14

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior



Frederic Weisbecker wrote:
> On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
>> Hi,
>>
>> We introduce 'perf timer' in this patchset, it can analyze timer
>> latency and timer function handle time, the usage and result is
>> like below:
>>
>> # perf timer record
>> # perf timer lat --print-lat --print-handle
>> -------------------------------------------------------------------------------------------------------
>> | Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
>> |0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
>> |0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
>> |0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
>> |main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
>> |main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |
>
>
>
> Cool! This is really a good work and a good idea.
>
> Just have some neats in mind. hrtimers and timers don't have the same latency and
> granularity requirements.
>
> As you show it, timers have an HZ granularity and hrtimers are about nanoseconds,
> and mixing them up in the same array of latency report is too messy.
> They don't have the same granularity/latency scope so they should
> be reported separately.
>

Yeah, it has different unit and looks ugly :-(

but, the problem is we can't get HZ in userspace now, i'll export HZ by proc or other
way and rectify 'perf timer' output in my next work

Thanks,
Xiao

2009-12-16 01:22:28

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf_event: fix getting point

On Wed, Dec 16, 2009 at 09:03:47AM +0800, Xiao Guangrong wrote:
> >> @@ -1919,13 +1919,32 @@ raw_field_value(struct event *event, const char *name, void *data)
> >>
> >> void *raw_field_ptr(struct event *event, const char *name, void *data)
> >> {
> >> + void *ptr;
> >> + unsigned long long value;
> >> +
> >> + value = raw_field_value(event, name, data);
> >> +
> >> + if (!value)
> >> + return NULL;
> >> +
> >> + memcpy(&ptr, &value, sizeof(ptr));
> >
> >
> >
> > Could you perhaps just do
> >
> > ptr = (void *)value; ?
> >
>
> We can't do it in x86_32:
> error: cast to pointer from integer of different size


You can just do:

ptr = (void *)(unsigned long)value;

2009-12-16 01:34:13

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 2/4] perf_event: fix getting point



Frederic Weisbecker wrote:

>> We can't do it in x86_32:
>> error: cast to pointer from integer of different size
>
>
> You can just do:
>
> ptr = (void *)(unsigned long)value;
>

Yeah, i forget this, it work well both in 32/64 machine.

Thanks,
Xiao

2009-12-16 05:58:22

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf/timer: 'perf timer' core code

Hi Thomas,

Sorry for many mistakes(typos and bad ideas) in this patch, i'll cook it
more and be careful later. Thanks very much.

Thomas Gleixner wrote:
> The output is confusing in several aspects:
>
> 1) Different time units:
>
> Please use consistent time units for everything. micro seconds
> are fine and we definitely do not care about nanosecond
> fractions.

OK, i'll change ns to ms, and for timer, the unit is HZ, do we have the
way to read kernel HZ in userspace? if not, i'll export HZ by proc/debugfs
or other way

>
> 2) Timer description:
>
> Why do we have hex addresses and process names intermingled ? Why
> don't we print the process/thread name which owns the timer
> always ? [PROF/VIRTUAL] is not a property of the Timer, it
> belongs into type.

Um, but not every timer has it's owner task, for example, if we start
a timer in interrupt handle function, this timer in not owns any tasks.
And itimer is started by userspace task so we can get it's owner, that
why i print hex address for timer/hrtimer, and print task name for itimer.

>
> 3) Max-lat-at-Task:
>
> What does this field tell ?

It means that which task in running when the maximum latency occurs.
but, as you noticed, this is useless, i'll remove it in next version
patch.

>
> 4) *handle:
>
> That should be a more descriptive name, e.g. function runtime
>

OK, will fix

> 5) Max-lat-at-func:
>
> Is this the callback function which ran the longest time ? Why
> is it named latency ? Why is it not decoded into a symbol ?

it's my typo, i'll export it using right/better way

>
> Btw, fitting the output into 80chars allows to use the tool on a non
> graphical terminal as well.
>

OK, will fix

> Also there are other metrics of timers which are interesting and
> should be retrieved by such a tool:
>
> number of activated timers
> number of canceled timers
> number of expired timers
>
> in the form of simple statistics.
>

OK. will support it

> The canceled timers are interesting also in a list, so we can see
> which timers are canceled after which time and how long before the
> expiry.
>

Um, i'll cook timer tracepoints to get the time when timer canceled,
and support this function.

>> +static const char * const timer_usage[] = {
>> + "perf timer [<options>] {record|latency}",
>> + NULL
>> +};
>
> Your example above uses "perf timer lat". What's correct ?
>

Actually, we only compare frontal 3 characters:
strncmp(argv[0], "lat", 3)

And 'perf sched' and other commands also use this way.

>> +static const struct option latency_options[] = {
>> + OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
>> + "sort by key(s): "SORT_KEY),
>
> Do we really need a sort order ? A single sort key should be
> sufficient.
>

i think it's necessary.

for example, if we interest in timer's max latency, then we can
use '-s max-timer-latency' to sort it.

And, if it has many timer with the some max latency, then we can
use '-s max-timer-latency,avg-timer-latency' to sort

> I'd prefer to have a selector instead which lets me filter timer
> types. If I debug hrtimers then I have no interest in itimers or timer
> list timers.
>

OK, will support this filter

>> +static LIST_HEAD(sort_list);
>> +
>> +static void setup_sorting(void)
>> +{
>> + char *tmp, *tok, *str = strdup(sort_order);
>
> Please hand sort_order in as an argument.
>

Sorry for my stupid question:
'sort_order' is a global variable and setup_sorting() only called
one time, why need hand sort_order in as an argument?

>> +static struct timer_info *
>> +__timer_search(struct rb_root *root, struct timer_info *timer,
>> + struct list_head *_sort_list)
>> +{
>> + struct rb_node *node = root->rb_node;
>> +
>> + while (node) {
>> + struct timer_info *timer_info;
>> + int cmp;
>> +
>> + timer_info = container_of(node, struct timer_info, node);
>> +
>> + cmp = timer_key_cmp(_sort_list, timer, timer_info);
>> + if (cmp > 0)
>> + node = node->rb_left;
>> + else if (cmp < 0)
>> + node = node->rb_right;
>> + else
>> + return timer_info;
>
> This looks more than odd. You search for a timer in the list by
> using the compare functions which are used for sorting ?
>
> How should that work ?
>

We put/get timer in a rb-tree base on the specify order, for example:
we default use this order:

sort_dimension__add("timer", &default_cmp);
sort_dimension__add("itimer-type", &default_cmp);

if timer_info->timer is bigger, we put it to left child, littler to right
child, if the timer_info->timer is the same, then we compare
timer_info->itimer_type.

>> +{
>> + struct timer_info *find = NULL;
>> + struct timer_info timer_info = {
>> + .timer = timer,
>> + .itimer_type = itimer_type,
>> + };
>> +
>> + find = timer_search(&timer_info);
>> + if (find && find->type != type) {
>> +
>> + dprintf("find timer[%p], but type[%s] is not we expect[%s],"
>> + "set to initializtion state.\n", find->timer,
>> + timer_type_string[find->type], timer_type_string[type]);
>> +
>> + find->type = type;
>> + find->bug++;
>> + find->state = TIMER_INIT;
>
> Why does a timer_search fail ? And why is fixing up the type if it
> is not matching a good idea ?
>

We search timer base on timer_info->timer and timer_info->itimer_type(not timer_info->type),
if we find the timer's type is changed(for example, the timer is "ITIMER" before, and change
to "HRTIMER" later), is should a bug. this case is hardly to happen but should catch it.

>> +static void *get_timer(enum timer_type type, struct event *event, void *data)
>> +{
>> + if (type == HRTIMER) {
>> + void *hrtimer = NULL;
>> +
>> + FILL_RAM_FIELD_PTR(event, hrtimer, data);
>> + return hrtimer;
>
> Shudder.
>
> return raw_field_ptr(event, "hrtimer", data);
>

Yeah, it's a clear way.

>> +static void
>> +itimer_state_handler(void *data, struct event *event, int this_cpu __used,
>> + u64 timestamp __used, struct thread *thread)
>> +{
>> + u64 value_sec, value_usec, expires;
>> + struct timer_info *timer_info;
>> + void *timer = NULL;
>> + int which;
>> +
>> + FILLL_RAW_FIELD_VALUE(event, value_sec, data);
>> + FILLL_RAW_FIELD_VALUE(event, value_usec, data);
>> + FILLL_RAW_FIELD_VALUE(event, expires, data);
>> + FILLL_RAW_FIELD_VALUE(event, which, data);
>> + FILL_RAM_FIELD_PTR(event, timer, data);
>
> This is complete obfuscated, while
>
> value_sec = get_value(data, event, "value_sec");
>
> is obvious.
>

Sorry, i cannot get this. As i understand:

#define FILL_RAW_FIELD_VALUE(event, field, data) \
field = (typeof(field))raw_field_value(event, #field, data)

After FILL_RAW_FIELD_VALUE(event, value_sec, data) expanded, it's:
value_sec = raw_field_value(event, "value_sec", data)

Why it's wrong? :-(

>> + timer_info = timer_findnew(thread, ITIMER, which);
>> +
>> + /* itimer canceled, we skip this event */
>> + if (!value_sec && !value_usec)
>> + return ;
>
> You throw away valuable information here about canceled timers.
>

We are not catch *_cancel event in this patch, i'll catch it to support
'number of canceled timers' in the next version.

Thanks,
Xiao

2009-12-16 07:33:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior


* Xiao Guangrong <[email protected]> wrote:

>
>
> Frederic Weisbecker wrote:
> > On Tue, Dec 15, 2009 at 07:17:03PM +0800, Xiao Guangrong wrote:
> >> Hi,
> >>
> >> We introduce 'perf timer' in this patchset, it can analyze timer
> >> latency and timer function handle time, the usage and result is
> >> like below:
> >>
> >> # perf timer record
> >> # perf timer lat --print-lat --print-handle
> >> -------------------------------------------------------------------------------------------------------
> >> | Timer | TYPE | Avg-latency | Max-latency | Max-latency-at-TS |Max-lat-at-Task |
> >> |0xf7ad1f5c |hrtimer |996068.500 ns|1607650 ns|10270128658526 |init |
> >> |0xf7903f04 |timer |0.625 HZ|2 HZ|10270344082394 |swapper |
> >> |0xf787a05c |hrtimer |200239.500 ns|359929 ns|10269316024808 |main |
> >> |main :[ PROF]|itimer |0.000 HZ|0 HZ|10237021270557 |main |
> >> |main :[VIRTUAL]|itimer |0.000 HZ|0 HZ|10257314773501 |main |
> >
> >
> >
> > Cool! This is really a good work and a good idea.
> >
> > Just have some neats in mind. hrtimers and timers don't have the same latency and
> > granularity requirements.
> >
> > As you show it, timers have an HZ granularity and hrtimers are about nanoseconds,
> > and mixing them up in the same array of latency report is too messy.
> > They don't have the same granularity/latency scope so they should
> > be reported separately.
> >
>
> Yeah, it has different unit and looks ugly :-(
>
> but, the problem is we can't get HZ in userspace now, i'll export HZ by proc
> or other way and rectify 'perf timer' output in my next work

We could export HZ as part of the ftrace event format file, in
/debug/tracing/events/timers/ or so. Not sure what the best way would be to do
it (without causing runtime overhead in event processing), but we should
reduce our reliance on /proc and export such instrumentation information via a
single, coherent interface.

There's a couple of other 'system settings' values that would be useful to
expose in such a way - for example the number of online CPUs. (in fact that
could be exposed as a set of online/offline events plus a way to read the
current count as well)

Ingo

2009-12-16 07:42:45

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior



Ingo Molnar wrote:

>>>
>> Yeah, it has different unit and looks ugly :-(
>>
>> but, the problem is we can't get HZ in userspace now, i'll export HZ by proc
>> or other way and rectify 'perf timer' output in my next work
>
> We could export HZ as part of the ftrace event format file, in
> /debug/tracing/events/timers/ or so. Not sure what the best way would be to do
> it (without causing runtime overhead in event processing), but we should
> reduce our reliance on /proc and export such instrumentation information via a
> single, coherent interface.
>

Hi ingo,

Um, if we export this value in tracepoint format, it should add for every timer's
tracepoint.

> There's a couple of other 'system settings' values that would be useful to
> expose in such a way - for example the number of online CPUs. (in fact that
> could be exposed as a set of online/offline events plus a way to read the
> current count as well)
>


How about export those values in debugfs/tracing/*? perf tracepoint self depend on
this filesystem.

Thanks,
Xiao

2009-12-16 07:46:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf_event: introduce 'perf timer' to analyze timer's behavior


* Xiao Guangrong <[email protected]> wrote:

>
>
> Ingo Molnar wrote:
>
> >>>
> >> Yeah, it has different unit and looks ugly :-(
> >>
> >> but, the problem is we can't get HZ in userspace now, i'll export HZ by proc
> >> or other way and rectify 'perf timer' output in my next work
> >
> > We could export HZ as part of the ftrace event format file, in
> > /debug/tracing/events/timers/ or so. Not sure what the best way would be to do
> > it (without causing runtime overhead in event processing), but we should
> > reduce our reliance on /proc and export such instrumentation information via a
> > single, coherent interface.
>
> Hi ingo,
>
> Um, if we export this value in tracepoint format, it should add for every
> timer's tracepoint.

Not if it's a special timer tracepoint 'HZ' dedicated for this. (We might even
get events through it on dynticks kernels, or on dynamic-sched-tick systems
[in the future] when the scheduler HZ value gets modified - or so.)

> > There's a couple of other 'system settings' values that would be useful to
> > expose in such a way - for example the number of online CPUs. (in fact
> > that could be exposed as a set of online/offline events plus a way to read
> > the current count as well)
>
> How about export those values in debugfs/tracing/*? perf tracepoint self
> depend on this filesystem.

No - in the long run most of those APIs will go away, and what remains are the
ftrace event format descriptors in essence. (Which we then might move to
/proc/events/ or so.)

Ingo

2009-12-16 16:00:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf/timer: 'perf timer' core code

Xiao,

On Wed, 16 Dec 2009, Xiao Guangrong wrote:
>
> Sorry for many mistakes(typos and bad ideas) in this patch, i'll cook it
> more and be careful later. Thanks very much.

Nothing to be sorry about. That's why we review code.

> Thomas Gleixner wrote:
> > The output is confusing in several aspects:
> >
> > 2) Timer description:
> >
> > Why do we have hex addresses and process names intermingled ? Why
> > don't we print the process/thread name which owns the timer
> > always ? [PROF/VIRTUAL] is not a property of the Timer, it
> > belongs into type.
>
> Um, but not every timer has it's owner task, for example, if we start
> a timer in interrupt handle function, this timer in not owns any tasks.
> And itimer is started by userspace task so we can get it's owner, that
> why i print hex address for timer/hrtimer, and print task name for itimer.

Well, lot's of timers have an owner task. At least all user space
related ones. And if the timer is rearmed in interrupt context, then
this does not change the owner at all.

> >> +static LIST_HEAD(sort_list);
> >> +
> >> +static void setup_sorting(void)
> >> +{
> >> + char *tmp, *tok, *str = strdup(sort_order);
> >
> > Please hand sort_order in as an argument.
> >
>
> Sorry for my stupid question:
> 'sort_order' is a global variable and setup_sorting() only called
> one time, why need hand sort_order in as an argument?

Fair enough.

> > How should that work ?
> >
>
> We put/get timer in a rb-tree base on the specify order, for example:
> we default use this order:
>
> sort_dimension__add("timer", &default_cmp);
> sort_dimension__add("itimer-type", &default_cmp);
>
> if timer_info->timer is bigger, we put it to left child, littler to right
> child, if the timer_info->timer is the same, then we compare
> timer_info->itimer_type.

Hmm, I wonder whether a hash table would be more efficient for the
recording side.

> >> +{
> >> + struct timer_info *find = NULL;
> >> + struct timer_info timer_info = {
> >> + .timer = timer,
> >> + .itimer_type = itimer_type,
> >> + };
> >> +
> >> + find = timer_search(&timer_info);
> >> + if (find && find->type != type) {
> >> +
> >> + dprintf("find timer[%p], but type[%s] is not we expect[%s],"
> >> + "set to initializtion state.\n", find->timer,
> >> + timer_type_string[find->type], timer_type_string[type]);
> >> +
> >> + find->type = type;
> >> + find->bug++;
> >> + find->state = TIMER_INIT;
> >
> > Why does a timer_search fail ? And why is fixing up the type if it
> > is not matching a good idea ?
> >

> We search timer base on timer_info->timer and
> timer_info->itimer_type(not timer_info->type), if we find the
> timer's type is changed(for example, the timer is "ITIMER" before,
> and change to "HRTIMER" later), is should a bug. this case is hardly
> to happen but should catch it.

No, it's not a bug at all. You _can_ have a hrtimer and a timer_list
timer at the same address in a trace. There are two ways to make that
happen:

1) kmalloc'ed memory contains a timer_list. timer operation is done,
memory is kfreed. Now another kmalloc gets the just freed memory
and has a hrtimer at the same address which was used by the
timer_list before.

2) timer_list and hrtimer are also allocated on stack. There is no
guarantee that they are at different addresses.

> >> +static void *get_timer(enum timer_type type, struct event *event, void *data)
> >> +{
> >> + if (type == HRTIMER) {
> >> + void *hrtimer = NULL;
> >> +
> >> + FILL_RAM_FIELD_PTR(event, hrtimer, data);
> >> + return hrtimer;
> >
> > Shudder.
> >
> > return raw_field_ptr(event, "hrtimer", data);
> >
>
> Yeah, it's a clear way.
>
> >> +static void
> >> +itimer_state_handler(void *data, struct event *event, int this_cpu __used,
> >> + u64 timestamp __used, struct thread *thread)
> >> +{
> >> + u64 value_sec, value_usec, expires;
> >> + struct timer_info *timer_info;
> >> + void *timer = NULL;
> >> + int which;
> >> +
> >> + FILLL_RAW_FIELD_VALUE(event, value_sec, data);
> >> + FILLL_RAW_FIELD_VALUE(event, value_usec, data);
> >> + FILLL_RAW_FIELD_VALUE(event, expires, data);
> >> + FILLL_RAW_FIELD_VALUE(event, which, data);
> >> + FILL_RAM_FIELD_PTR(event, timer, data);
> >
> > This is complete obfuscated, while
> >
> > value_sec = get_value(data, event, "value_sec");
> >
> > is obvious.
> >
>
> Sorry, i cannot get this. As i understand:
>
> #define FILL_RAW_FIELD_VALUE(event, field, data) \
> field = (typeof(field))raw_field_value(event, #field, data)
>
> After FILL_RAW_FIELD_VALUE(event, value_sec, data) expanded, it's:
> value_sec = raw_field_value(event, "value_sec", data)
>
> Why it's wrong? :-(

Simply because the macro hides the fact that this is an assignment of
a value to a variable. That makes the code harder to read.

FILLL_RAW_FIELD_VALUE(event, value_sec, data);
vs.
value_sec = get_value(data, event, "value_sec");

The latter is fast to parse and entirely clear.

Btw, you agreed above that the open coded call of raw_field_value()
is clearer than the macro magic. :)

Thanks,

tglx

2009-12-17 07:27:58

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 4/4] perf/timer: 'perf timer' core code

Hi Thomas,

Thomas Gleixner wrote:
>
> Nothing to be sorry about. That's why we review code.
>

Thanks.

>> Um, but not every timer has it's owner task, for example, if we start
>> a timer in interrupt handle function, this timer in not owns any tasks.
>> And itimer is started by userspace task so we can get it's owner, that
>> why i print hex address for timer/hrtimer, and print task name for itimer.
>
> Well, lot's of timers have an owner task. At least all user space
> related ones. And if the timer is rearmed in interrupt context, then
> this does not change the owner at all.
>

Sorry, i'm confused, for example, has below sequence:

Task1 running----->| (interrupt)
|------------- start timerT(start timerT in interrupt handler)

......
( After a while, schedule to another task, and interruption coming )
......

Task2 running----->| (interrupt)
|------------- start timerT again

Then, which task is the timerT owner?

Am I missed something?

>>> How should that work ?
>>>
>> We put/get timer in a rb-tree base on the specify order, for example:
>> we default use this order:
>>
>> sort_dimension__add("timer", &default_cmp);
>> sort_dimension__add("itimer-type", &default_cmp);
>>
>> if timer_info->timer is bigger, we put it to left child, littler to right
>> child, if the timer_info->timer is the same, then we compare
>> timer_info->itimer_type.
>
> Hmm, I wonder whether a hash table would be more efficient for the
> recording side.
>

Um. i'll record it address your way.

>
>> We search timer base on timer_info->timer and
>> timer_info->itimer_type(not timer_info->type), if we find the
>> timer's type is changed(for example, the timer is "ITIMER" before,
>> and change to "HRTIMER" later), is should a bug. this case is hardly
>> to happen but should catch it.
>
> No, it's not a bug at all. You _can_ have a hrtimer and a timer_list
> timer at the same address in a trace. There are two ways to make that
> happen:
>
> 1) kmalloc'ed memory contains a timer_list. timer operation is done,
> memory is kfreed. Now another kmalloc gets the just freed memory
> and has a hrtimer at the same address which was used by the
> timer_list before.
>
> 2) timer_list and hrtimer are also allocated on stack. There is no
> guarantee that they are at different addresses.
>

Yeah, my mistake.

> Simply because the macro hides the fact that this is an assignment of
> a value to a variable. That makes the code harder to read.
>
> FILLL_RAW_FIELD_VALUE(event, value_sec, data);
> vs.
> value_sec = get_value(data, event, "value_sec");
>
> The latter is fast to parse and entirely clear.
>

Yeah.

> Btw, you agreed above that the open coded call of raw_field_value()
> is clearer than the macro magic. :)
>

Sorry, i misunderstand your mean before.

Thanks,
Xiao

2009-12-22 13:01:52

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 0/5] perf_event: introduce 'perf timer' to analyze timer's behavior

Hi,

We introduce 'perf timer' in this patchset, it can report
number of activated/expired/canceled timers, timer latency
and timer function runtime.

Changlog v1->v2:
- export HZ in timer's tracepoint, we can use it to get
TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency as Ingo's
suggestion

- rename 'perf timer latency' to 'perf timer report',
because it not only show timer latency but also show timer
function runtime, timer activated/canceled/expired statistics
and canceled timer list(TODO)

Below changes are all from Thomas's suggestion:
- fix a lot of typos and bad ideas

- use hash table instead of RB tree to record timer

- let output information more readable

- report number of activated/canceled/expired timers

- support to filter timer types:
'--type' can do it

TODO:
- Show canceled timer list that is suggested by Thomas

- Any suggestion are welcome.

===============================================
Usage:
- record timer events:
#perf timer record
#^C

- show timer statistics:
#perf timer report

Statistics
==========
Activated timers number:
HRTIMER: 37199 TIMER: 828 ITIMER_PROF: 2 ITIMER_VIRTUAL: 2 ITIMER_REAL: 18

Expired timers number:
HRTIMER: 34912 TIMER: 617 ITIMER_PROF: 1 ITIMER_VIRTUAL: 1 ITIMER_REAL: 1

Canceled timers number:
HRTIMER: 37200 TIMER: 829 ITIMER_PROF: 1 ITIMER_VIRTUAL: 1 ITIMER_REAL: 17

- show timer latency:
#perf timer report --print-lat
Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
smbd |HRTIMER |1.563 |1.563 |40666.379
irqbalance |HRTIMER |0.842 |2.484 |40670.231
......
vi |HRTIMER |0.123 |0.123 |40637.065
events/1 |TIMER |168.420 |1008.000 |40677.006
events/0 |TIMER |87.759 |1008.000 |40618.009
......
main |ITIMER_PROF |0.000 |0.000 |40661.023
main |ITIMER_VIRTUAL |0.000 |0.000 |40642.829
main |ITIMER_REAL |0.133 |0.133 |40622.731


- show timer function runtime:
#perf timer report --print-runtime
Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
Timer | Type |Avg-rt ms|Max-rt ms| Max-rt-func | Max-rt-ts s
-------------------------------------------------------------------------------
smbd |HRTIMER|0.020 |0.020 |hrtimer_wakeup |40666.379
irqbalance |HRTIMER|0.012 |0.020 |hrtimer_wakeup |40670.231
......
events/1 |TIMER |0.009 |0.017 |delayed_work_timer_fn |40646.090
events/0 |TIMER |0.009 |0.091 |delayed_work_timer_fn |40651.105

- you can use '--print-lat' and '--print-runtime' at the same time:
#perf timer report --print-runtime --print-lat

Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
smbd |HRTIMER |1.563 |1.563 |40666.379
irqbalance |HRTIMER |0.842 |2.484 |40670.231
......
Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
Timer | Type |Avg-rt ms|Max-rt ms| Max-rt-func | Max-rt-ts s
-------------------------------------------------------------------------------
smbd |HRTIMER|0.020 |0.020 |hrtimer_wakeup |40666.379
irqbalance |HRTIMER|0.012 |0.020 |hrtimer_wakeup |40670.231

- Select timer type which you are interesting:
#perf timer report --print-lat --type timer
Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
events/1 |TIMER |168.420 |1008.000 |40677.006
events/0 |TIMER |87.759 |1008.000 |40618.009
events/1 |TIMER |57.000 |153.000 |40634.571

You can specify more types, such as: '--type timer,hrtimer'

2009-12-22 13:03:49

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 1/5] perf_event: fix getting point

The point that got by raw_field_ptr() is point to @data offset not
tracepoint saved.

Introduce raw_field_array() to get array data

Changlog v1->v2:
- fix the typo 'arry' -> 'array' that is pointed out by Frederic Weisbecker

- use '(void *)(unsigned long)value' instead of 'memcpy()' as
Frederic Weisbecker's suggestion

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 3 +--
tools/perf/util/trace-event-parse.c | 23 ++++++++++++++++++++---
tools/perf/util/trace-event.h | 3 +++
3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 80209df..c083e88 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -630,8 +630,7 @@ static void test_calibrations(void)

#define FILL_ARRAY(ptr, array, event, data) \
do { \
- void *__array = raw_field_ptr(event, #array, data); \
- memcpy(ptr.array, __array, sizeof(ptr.array)); \
+ raw_field_array(event, #array, data, ptr.array, sizeof(ptr.array));\
} while(0)

#define FILL_COMMON_FIELDS(ptr, event, data) \
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index c5c32be..037bbe3 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
field->flags |= FIELD_IS_ARRAY;

type = read_token(&token);
- while (strcmp(token, "]") != 0) {
+ while (strcmp(token, "]") != 0) {
if (last_type == EVENT_ITEM &&
type == EVENT_ITEM)
len = 2;
@@ -1919,13 +1919,30 @@ raw_field_value(struct event *event, const char *name, void *data)

void *raw_field_ptr(struct event *event, const char *name, void *data)
{
+ unsigned long long value;
+
+ value = raw_field_value(event, name, data);
+
+ if (!value)
+ return NULL;
+
+ return (void *)(unsigned long)value;
+}
+
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+ void *array, int array_size)
+{
+ int len;
struct format_field *field;

field = find_any_field(event, name);
if (!field)
- return NULL;
+ return 0ULL;

- return data + field->offset;
+ len = min(array_size, field->size);
+ memcpy(array, data + field->offset, len);
+ return len;
}

static int get_common_info(const char *type, int *offset, int *size)
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 6ad4056..c3b5c39 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -255,6 +255,9 @@ unsigned long long read_size(void *ptr, int size);
unsigned long long
raw_field_value(struct event *event, const char *name, void *data);
void *raw_field_ptr(struct event *event, const char *name, void *data);
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+ void *array, int array_size);
unsigned long long eval_flag(const char *flag);

int read_tracing_data(int fd, struct perf_event_attr *pattrs, int nb_events);
--
1.6.1.2

2009-12-22 13:05:19

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format

Export HZ in timer's tracepoint, we can use it to get
TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency and it's
suggested by Ingo

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/trace/events/timer.h | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 13ec15a..7749ae5 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -74,14 +74,17 @@ TRACE_EVENT(timer_expire_entry,
TP_STRUCT__entry(
__field( void *, timer )
__field( unsigned long, now )
+ __field( int, hz )
),

TP_fast_assign(
__entry->timer = timer;
__entry->now = jiffies;
+ __entry->hz = HZ;
),

- TP_printk("timer=%p now=%lu", __entry->timer, __entry->now)
+ TP_printk("timer=%p now=%lu HZ=%d", __entry->timer, __entry->now,
+ __entry->hz)
);

/**
--
1.6.1.2

2009-12-22 13:06:29

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 3/5] trace_event: record task' real_timer in itimer_state tracepoint

When itimer is ITIMER_REAL, we should cooperate with hrtimer's
tracepoit to get it's latency, this patch can let us get itimer's
hrtimer easy

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/trace/events/timer.h | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 9496b96..13ec15a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -285,6 +285,7 @@ TRACE_EVENT(itimer_state,

TP_STRUCT__entry(
__field( int, which )
+ __field( void *, timer )
__field( cputime_t, expires )
__field( long, value_sec )
__field( long, value_usec )
@@ -294,6 +295,8 @@ TRACE_EVENT(itimer_state,

TP_fast_assign(
__entry->which = which;
+ __entry->timer = which == ITIMER_REAL ?
+ &current->signal->real_timer : NULL;
__entry->expires = expires;
__entry->value_sec = value->it_value.tv_sec;
__entry->value_usec = value->it_value.tv_usec;
@@ -301,8 +304,9 @@ TRACE_EVENT(itimer_state,
__entry->interval_usec = value->it_interval.tv_usec;
),

- TP_printk("which=%d expires=%llu it_value=%ld.%ld it_interval=%ld.%ld",
- __entry->which, (unsigned long long)__entry->expires,
+ TP_printk("which=%d timer=%p expires=%llu it_value=%ld.%ld it_interval="
+ "%ld.%ld", __entry->which, __entry->timer,
+ (unsigned long long)__entry->expires,
__entry->value_sec, __entry->value_usec,
__entry->interval_sec, __entry->interval_usec)
);
--
1.6.1.2

2009-12-22 13:07:54

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 4/5] perf/timer: add document for 'perf timer'

Add document for 'perf timer', it let people know how to use it

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/Documentation/perf-timer.txt | 50 +++++++++++++++++++++++++++++++
1 files changed, 50 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/Documentation/perf-timer.txt

diff --git a/tools/perf/Documentation/perf-timer.txt b/tools/perf/Documentation/perf-timer.txt
new file mode 100644
index 0000000..5bd26d4
--- /dev/null
+++ b/tools/perf/Documentation/perf-timer.txt
@@ -0,0 +1,50 @@
+perf-timer(1)
+==============
+
+NAME
+----
+perf-timer - Tool to trace/measure timer properties
+
+SYNOPSIS
+--------
+[verse]
+'perf timer' {record|report} [<options>]
+
+DESCRIPTION
+-----------
+There are two variants of perf timer:
+
+ 'perf timer record <command>' to record the timer events
+ of an arbitrary workload.
+
+ 'perf timer report' to report number of activated/expired/canceled
+ timers, timer latency and timer function runtime.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace=::
+ Display verbose dump of the sched data.
+
+-i <file>::
+--input=<file>::
+ Select the input file (default: perf.data)
+
+-s <key[,key2...]>::
+--sort=<key[,key2...]>::
+ Sort the output (default: type, avg-lat, max-lat,
+ avg-runtime, max-runtime)
+
+--print-lat::
+ Print timer latency
+
+--print-runtime::
+ Print timer function runtime
+
+--type::
+ Timer type selector, you can specify one or more types
+ (timer, hrtimer, itimer-real, itimer-virtual, itimer-prof)
+
+SEE ALSO
+--------
+linkperf:perf-record[1]
--
1.6.1.2

2009-12-22 13:10:37

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 5/5] perf/timer: add 'perf timer' core code

It's the core code of 'perf timer', it can report number of
activated/expired/canceled timers, timer latency and timer function
runtime

Sample output:
- Record timer event:

#perf timer record
^C

- Show timer latency:
#perf timer report --print-lat
Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper |HRTIMER |0.180 |2.404 |42722.586
swapper |TIMER |2.000 |2.000 |42722.586
events/0 |TIMER |2.000 |2.000 |42723.002
events/0 |TIMER |0.000 |0.000 |42723.641
sshd |TIMER |0.000 |0.000 |42723.641

Statistics
==========
Activated timers number:
HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Expired timers number:
HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Canceled timers number:
HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

- Show timer function runtime:
#perf timer report --print-runtime
Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
Timer | Type |Avg-rt ms|Max-rt ms| Max-rt-func | Max-rt-ts s
-------------------------------------------------------------------------------
swapper |HRTIMER|0.010 |0.026 |tick_sched_timer |42722.586
0xc15f2b70 |HRTIMER|0.009 |0.009 |Not - Catch |42722.000
swapper |TIMER |0.058 |0.111 |tcp_write_timer |42721.930
events/0 |TIMER |0.007 |0.010 |delayed_work_timer_fn |42723.002
swapper |TIMER |0.037 |0.037 |dev_watchdog |42723.002
events/0 |TIMER |0.028 |0.028 |delayed_work_timer_fn |42723.641
swapper |TIMER |0.023 |0.023 |pcnet32_watchdog |42722.001
events/1 |TIMER |0.014 |0.014 |Not - Catch |42723.003
events/0 |TIMER |0.010 |0.010 |Not - Catch |42722.001
sshd |TIMER |0.009 |0.009 |delayed_work_timer_fn |42723.641
swapper |TIMER |0.004 |0.004 |Not - Catch |42721.638
events/1 |TIMER |0.003 |0.003 |Not - Catch |42723.003

Statistics
==========
Activated timers number:
HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Expired timers number:
HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Canceled timers number:
HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

- Specify types which you are interesting and choose one or more of
"timer, hrtimer, itimer-real, itimer-virtual, itimer-prof"

Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper |TIMER |2.000 |2.000 |42722.586
events/0 |TIMER |2.000 |2.000 |42723.002
events/0 |TIMER |0.000 |0.000 |42723.641
sshd |TIMER |0.000 |0.000 |42723.641

Changlog v1->v2:
- rename 'perf timer latency' to 'perf timer report',
because it not only show timer latency but also show timer
function runtime, timer activated/canceled/expired statistics
and canceled timer list(TODO)

Below changes are all from Thomas's suggestion:
- fix a lot of typos and bad ideas

- use hash table instead of RB tree to record timer

- let output information more readable

- report number of activated/canceled/expired timers

- support to filter timer types:
'--type' can do it

TODO:
- Show canceled timer list that is suggested by Thomas

- Any suggestion are welcome.

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-timer.c | 1000 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
5 files changed, 1004 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/builtin-timer.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 4390d22..9b39528 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -440,6 +440,7 @@ BUILTIN_OBJS += builtin-top.o
BUILTIN_OBJS += builtin-trace.o
BUILTIN_OBJS += builtin-probe.o
BUILTIN_OBJS += builtin-kmem.o
+BUILTIN_OBJS += builtin-timer.o

PERFLIBS = $(LIB_FILE)

diff --git a/tools/perf/builtin-timer.c b/tools/perf/builtin-timer.c
new file mode 100644
index 0000000..66afa05
--- /dev/null
+++ b/tools/perf/builtin-timer.c
@@ -0,0 +1,1000 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+
+#include <sys/prctl.h>
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+#include "../../include/linux/hash.h"
+
+#define SORT_KEY "type, avg-lat, max-lat, avg-runtime, max-runtime"
+
+static char const *input_name = "perf.data";
+static const char *sort_order = SORT_KEY;
+static int print_lat, print_runtime, profile_cpu = -1;
+static struct perf_session *timer_session;
+static int bug_nr;
+
+enum timer_type {
+ REAL_ITIMER,
+ VIRTUAL_ITIMER,
+ PROF_ITIMER,
+ TIMER,
+ HRTIMER,
+
+ MAX_TIMER_TYPE,
+};
+
+static char const *timer_type_str[] = {
+ "ITIMER_REAL",
+ "ITIMER_VIRTUAL",
+ "ITIMER_PROF",
+ "TIMER",
+ "HRTIMER",
+};
+
+enum stat_type {
+ ACTIVATED,
+ EXPIRED,
+ CANCELED,
+
+ MAX_STATS,
+};
+
+static u32 stats[MAX_STATS][MAX_TIMER_TYPE];
+static inline void
+stats_update(enum timer_type timer_type, enum stat_type stat_type)
+{
+ stats[stat_type][timer_type]++;
+}
+
+static int hz;
+static inline void update_hz(int nhz)
+{
+ if (!hz)
+ hz = nhz;
+}
+
+static const struct option timer_options[] = {
+ OPT_STRING('i', "input", &input_name, "file",
+ "input file name"),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show symbol address, etc)"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_END()
+};
+
+static const char *const timer_usage[] = {
+ "perf timer [<options>] {record|report}",
+ NULL
+};
+
+static int filter_type;
+static int parse_type_opt(const struct option *opt __used,
+ const char *arg, int unset __used)
+{
+ char *tok, *str = strdup(arg);
+ int ret = 0;
+
+ if (!str)
+ die("strdup");
+
+ while ((tok = strsep(&str, ",")) != NULL) {
+
+ if (!strcmp(tok, "timer"))
+ filter_type |= 1 << TIMER;
+ else if (!strcmp(tok, "hrtimer"))
+ filter_type |= 1 << HRTIMER;
+ else if (!strcmp(tok, "itimer-real"))
+ filter_type |= 1 << REAL_ITIMER;
+ else if (!strcmp(tok, "itimer-virtual"))
+ filter_type |= 1 << VIRTUAL_ITIMER;
+ else if (!strcmp(tok, "itimer-prof"))
+ filter_type |= 1 << PROF_ITIMER;
+ else {
+ printf("Unknow --type: %s\n", tok);
+ ret = -1;
+ break;
+ }
+ }
+
+ free(str);
+ return ret;
+}
+
+static const struct option report_options[] = {
+ OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+ "sort by key(s): "SORT_KEY),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show symbol address, etc)"),
+ OPT_INTEGER('C', "CPU", &profile_cpu,
+ "CPU to profile on"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_BOOLEAN(0, "print-lat", &print_lat, "show timer latency"),
+ OPT_BOOLEAN(0, "print-runtime", &print_runtime,
+ "show timer function runtime"),
+ OPT_CALLBACK(0, "type", NULL, "timer types",
+ "timer type selector, specify one or more types(timer, "
+ "hrtimer, itimer-real, itimer-virtual, itimer-prof)",
+ parse_type_opt),
+ OPT_END()
+};
+
+static const char *const report_usage[] = {
+ "perf timer report [<options>]",
+ NULL
+};
+
+struct timer_entry {
+ struct timer_entry *next;
+ struct rb_node node;
+
+ void *timer;
+ enum timer_type type;
+ u64 function;
+ struct thread *timer_thread;
+ int expire_nr;
+ int run_func_nr;
+ int bug;
+
+ struct timer_entry *itimer_hrtimer;
+
+ u64 timer_last_lat;
+ u64 timer_expire_ts;
+ u64 timer_total_lat;
+ double timer_avg_lat;
+ s64 timer_max_lat;
+ u64 timer_max_lat_at_ts;
+
+ u64 func_entry_ts;
+ u64 func_total_runtime;
+ double func_avg_runtime;
+ s64 func_max_runtime;
+ u64 func_max_runtime_at_ts;
+ u64 max_runtime_at_func;
+};
+
+typedef int (*sort_func)(struct timer_entry *, struct timer_entry *);
+struct sort_dimension {
+ const char *name;
+ sort_func cmp;
+ struct list_head list;
+};
+
+#define DEFINE_CMP_FUNCTION(field) \
+static int cmp_##field(struct timer_entry *v1, struct timer_entry *v2) \
+{ \
+ if (v1->field < v2->field) \
+ return -1; \
+ if (v1->field > v2->field) \
+ return 1; \
+ return 0; \
+}
+
+DEFINE_CMP_FUNCTION(type);
+DEFINE_CMP_FUNCTION(timer_avg_lat);
+DEFINE_CMP_FUNCTION(timer_max_lat);
+DEFINE_CMP_FUNCTION(func_avg_runtime);
+DEFINE_CMP_FUNCTION(func_max_runtime);
+
+#define SORT(field, _name) \
+{ .cmp = cmp_##field, \
+ .name = _name, \
+}
+
+static struct sort_dimension sorts[] = {
+ SORT(type, "type"),
+ SORT(timer_avg_lat, "avg-lat"),
+ SORT(timer_max_lat, "max-lat"),
+ SORT(func_avg_runtime, "avg-runtime"),
+ SORT(func_max_runtime, "max-runtime"),
+};
+
+static int sort_dimension_add(const char *tok, struct list_head *list)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(sorts); i++) {
+ if (!strcmp(sorts[i].name, tok)) {
+ list_add_tail(&sorts[i].list, list);
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static LIST_HEAD(sort);
+static void setup_sorting(char const *sort_string)
+{
+ char *tmp, *tok, *str = strdup(sort_string);
+
+ for (tok = strtok_r(str, ",", &tmp);
+ tok; tok = strtok_r(NULL, ", ", &tmp)) {
+ if (sort_dimension_add(tok, &sort) < 0) {
+ error("Unknown --sort key: `%s'", tok);
+ usage_with_options(report_usage, report_options);
+ }
+ }
+
+ free(str);
+}
+
+static int timer_key_cmp(struct list_head *list, struct timer_entry *l,
+ struct timer_entry *r)
+{
+ struct sort_dimension *sortp;
+ int ret = 0;
+
+ BUG_ON(list_empty(list));
+
+ list_for_each_entry(sortp, list, list) {
+ ret = sortp->cmp(l, r);
+ if (ret)
+ return ret;
+ }
+
+ return ret;
+}
+
+static void timer_rb_insert(struct rb_root *root, struct timer_entry *timer,
+ struct list_head *sort_list)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ while (*new) {
+ struct timer_entry *this;
+ int cmp;
+
+ this = container_of(*new, struct timer_entry, node);
+ parent = *new;
+
+ cmp = timer_key_cmp(sort_list, timer, this);
+
+ if (cmp > 0)
+ new = &((*new)->rb_left);
+ else
+ new = &((*new)->rb_right);
+ }
+
+ rb_link_node(&timer->node, parent, new);
+ rb_insert_color(&timer->node, root);
+}
+
+static void timer_rb_traversal(struct rb_root *root,
+ void (*func)(struct timer_entry *))
+{
+ struct rb_node *next;
+
+ next = rb_first(root);
+
+ while (next) {
+ struct timer_entry *timer;
+
+ timer = rb_entry(next, struct timer_entry, node);
+ func(timer);
+ next = rb_next(next);
+ }
+}
+
+#define TIMER_HASH_BITS 10
+#define TIMER_HASH_SIZE (1UL << TIMER_HASH_BITS)
+
+static struct timer_entry *timer_hash_table[TIMER_HASH_SIZE];
+
+static inline int timer_hash_func(void *timer)
+{
+ return hash_ptr(timer, TIMER_HASH_BITS);
+}
+
+static struct timer_entry *
+timer_hash_findnew(void *timer, enum timer_type type)
+{
+ int hash = timer_hash_func(timer);
+ struct timer_entry **head, *curr, *prev = NULL;
+
+ head = timer_hash_table + hash;
+ curr = *head;
+
+ while (curr) {
+ if (curr->timer == timer && curr->type == type)
+ return curr;
+
+ prev = curr;
+ curr = curr->next;
+ }
+
+ curr = zalloc(sizeof(*curr));
+ if (!curr)
+ die("No memory");
+
+ curr->timer = timer;
+ curr->type = type;
+
+ if (prev)
+ prev->next = curr;
+ else
+ *head = curr;
+
+ return curr;
+}
+
+static inline void *
+get_timer_hrtimer(enum timer_type type, struct event *event, void *data)
+{
+ if (type == HRTIMER)
+ return raw_field_ptr(event, "hrtimer", data);
+
+ return raw_field_ptr(event, "timer", data);
+}
+
+/*
+ * if expiry time is not recorded, it means we not caught the start event,
+ * we can't calculate timer latency
+ */
+static inline bool check_timer_expire_ts(struct timer_entry *timer_entry)
+{
+ return !!timer_entry->timer_expire_ts;
+}
+
+/*
+ * if function entry time is not recorded, it means we not caught the
+ * expire entry event, we can't calculate function runtime
+ */
+static inline bool check_timer_func_entry_ts(struct timer_entry *timer_entry)
+{
+ return !!timer_entry->func_entry_ts;
+}
+
+static void timer_hrtimer_start(enum timer_type type, void *data,
+ struct event *event, struct thread *thread)
+{
+ void *timer;
+ struct timer_entry *timer_entry;
+ u64 expires, function;
+
+ timer = get_timer_hrtimer(type, event, data);
+ function = raw_field_value(event, "function", data);
+ expires = raw_field_value(event, "expires", data);
+
+ stats_update(type, ACTIVATED);
+ timer_entry = timer_hash_findnew(timer, type);
+
+ if (!timer_entry->timer_thread)
+ timer_entry->timer_thread = thread;
+
+ timer_entry->function = function;
+ timer_entry->timer_expire_ts = expires;
+}
+
+static void timer_hrtimer_expires_entry(enum timer_type type, void *data,
+ struct event *event, u64 timestamp)
+{
+ void *timer;
+ struct timer_entry *timer_entry;
+ u64 now;
+ s64 delta;
+
+ timer = get_timer_hrtimer(type, event, data);
+ now = raw_field_value(event, "now", data);
+
+ stats_update(type, EXPIRED);
+ timer_entry = timer_hash_findnew(timer, type);
+ timer_entry->func_entry_ts = timestamp;
+
+ if (!check_timer_expire_ts(timer_entry))
+ return;
+
+ delta = now - timer_entry->timer_expire_ts;
+ if (delta < 0) {
+ timer_entry->bug++;
+ printf("WARNING: timer[%p] type[%s]: latency < 0, expires"
+ "[%llu] now[%llu]", timer_entry->timer,
+ timer_type_str[timer_entry->type],
+ timer_entry->timer_expire_ts, now);
+ return ;
+ }
+
+ timer_entry->expire_nr++;
+ timer_entry->timer_last_lat = delta;
+ timer_entry->timer_total_lat += delta;
+ if (timer_entry->timer_max_lat <= delta) {
+ timer_entry->timer_max_lat = delta;
+ timer_entry->timer_max_lat_at_ts = timestamp;
+ }
+ return;
+}
+
+static void timer_hrtimer_expires_exit(enum timer_type type, void *data,
+ struct event *event, u64 timestamp)
+{
+ void *timer;
+ struct timer_entry *timer_entry;
+ s64 delta;
+
+ timer = get_timer_hrtimer(type, event, data);
+ timer_entry = timer_hash_findnew(timer, type);
+
+ if (!check_timer_func_entry_ts(timer_entry))
+ return;
+
+ delta = timestamp - timer_entry->func_entry_ts;
+ if (delta < 0) {
+ timer_entry->bug++;
+ printf("WARNING: timer[%p] type[%s]: runtime < 0, "
+ "func_entry_ts[%llu] now[%llu]", timer_entry->timer,
+ timer_type_str[timer_entry->type],
+ timer_entry->func_entry_ts, timestamp);
+ return;
+ }
+
+ timer_entry->run_func_nr++;
+ timer_entry->func_total_runtime += delta;
+ if (timer_entry->func_max_runtime <= delta) {
+ timer_entry->func_max_runtime = delta;
+ timer_entry->func_max_runtime_at_ts = timestamp;
+ timer_entry->max_runtime_at_func = timer_entry->function;
+ }
+
+ return;
+}
+
+static void timer_hrtimer_cancel(enum timer_type type)
+{
+ stats_update(type, CANCELED);
+}
+
+static void timer_start_handler(void *data, struct event *event,
+ u64 timestamp __used, struct thread *thread)
+{
+ return timer_hrtimer_start(TIMER, data, event, thread);
+}
+
+static void
+timer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ int nhz;
+
+ nhz = raw_field_value(event, "hz", data);
+ update_hz(nhz);
+ return timer_hrtimer_expires_entry(TIMER, data, event, timestamp);
+}
+
+static void
+timer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_exit(TIMER, data, event, timestamp);
+}
+
+static void timer_cancel(void *data __used, struct event *event __used,
+ u64 timestamp __used, struct thread *thread __used)
+{
+ return timer_hrtimer_cancel(TIMER);
+}
+
+static void hrtimer_start_handler(void *data, struct event *event,
+ u64 timestamp __used, struct thread *thread)
+{
+ return timer_hrtimer_start(HRTIMER, data, event, thread);
+}
+
+static void
+hrtimer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_entry(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_exit(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_cancel(void *data __used, struct event *event __used,
+ u64 timestamp __used, struct thread *thread __used)
+{
+ return timer_hrtimer_cancel(HRTIMER);
+}
+
+static void itimer_state_handler(void *data, struct event *event,
+ u64 timestamp __used, struct thread *thread)
+{
+ struct timer_entry *timer_entry;
+ u64 value_sec, value_usec, expires;
+ int which;
+
+ value_sec = raw_field_value(event, "value_sec", data);
+ value_usec = raw_field_value(event, "value_usec", data);
+ expires = raw_field_value(event, "expires", data);
+ which = raw_field_value(event, "which", data);
+
+ timer_entry = timer_hash_findnew(thread, which);
+
+ /* itimer canceled */
+ if (!value_sec && !value_usec) {
+ stats_update(which, CANCELED);
+ return ;
+ }
+
+ /* itimer started */
+ stats_update(which, ACTIVATED);
+
+ if (which == ITIMER_REAL) {
+ void *hrtimer;
+
+ hrtimer = raw_field_ptr(event, "timer", data);
+ timer_entry->itimer_hrtimer = timer_hash_findnew(hrtimer,
+ HRTIMER);
+ } else
+ timer_entry->timer_expire_ts = expires;
+}
+
+static void
+itimer_expire_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ struct thread *itimer_thread;
+ struct timer_entry *timer_entry;
+ u64 now;
+ s64 delta;
+ pid_t pid;
+ int which;
+
+ which = raw_field_value(event, "which", data);
+ now = raw_field_value(event, "now", data);
+ pid = raw_field_value(event, "pid", data);
+
+ itimer_thread = perf_session__findnew(timer_session, pid);
+ timer_entry = timer_hash_findnew(itimer_thread, which);
+
+ stats_update(which, EXPIRED);
+
+ if (which == ITIMER_REAL) {
+ if (!timer_entry->itimer_hrtimer ||
+ !check_timer_expire_ts(timer_entry->itimer_hrtimer))
+ return;
+ delta = timer_entry->itimer_hrtimer->timer_last_lat;
+ } else {
+ if (!check_timer_expire_ts(timer_entry))
+ return;
+ delta = now - timer_entry->timer_expire_ts;
+ }
+
+ if (delta < 0) {
+ timer_entry->bug++;
+ printf("WARNING: timer[%p] type[%s]: latency < 0, expires:"
+ "[%llu] now[%llu]", timer_entry->timer,
+ timer_type_str[timer_entry->type],
+ timer_entry->timer_expire_ts, now);
+ return;
+ }
+
+ timer_entry->expire_nr++;
+ timer_entry->timer_total_lat += delta;
+ if (timer_entry->timer_max_lat <= delta) {
+ timer_entry->timer_max_lat = delta;
+ timer_entry->timer_max_lat_at_ts = timestamp;
+ }
+
+ return;
+}
+
+struct event_handler {
+ const char *event_name;
+ void (*handler) (void *data, struct event *event, u64 timestamp,
+ struct thread *thread);
+} timer_handler[] = {
+ /* timer */
+ { "timer_start", timer_start_handler },
+ { "timer_expire_entry", timer_expire_entry_handler },
+ { "timer_expire_exit", timer_expire_exit_handler },
+ { "timer_cancel", timer_cancel },
+
+ /* hrtimer */
+ { "hrtimer_start", hrtimer_start_handler },
+ { "hrtimer_expire_entry", hrtimer_expire_entry_handler },
+ { "hrtimer_expire_exit", hrtimer_expire_exit_handler },
+ { "hrtimer_cancel", hrtimer_cancel },
+
+ /* itimer */
+ { "itimer_state", itimer_state_handler },
+ { "itimer_expire", itimer_expire_handler },
+};
+
+static void process_raw_event(event_t *raw_event __used,
+ struct perf_session *session __used, void *data,
+ int cpu __used, u64 timestamp,
+ struct thread *thread)
+{
+ struct event *event;
+ unsigned int i;
+ int type;
+
+ type = trace_parse_common_type(data);
+ event = trace_find_event(type);
+
+ for (i = 0; i < ARRAY_SIZE(timer_handler); i++)
+ if (!strcmp(timer_handler[i].event_name, event->name))
+ timer_handler[i].handler(data, event,
+ timestamp, thread);
+}
+
+static int process_sample_event(event_t *event, struct perf_session *session)
+{
+ struct sample_data data;
+ struct thread *thread;
+
+ if (!(session->sample_type & PERF_SAMPLE_RAW))
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.time = -1;
+ data.cpu = -1;
+ data.period = -1;
+
+ event__parse_sample(event, session->sample_type, &data);
+
+ dump_printf("(IP, %d): %d/%d: %p period: %lld\n",
+ event->header.misc,
+ data.pid, data.tid,
+ (void *)(long)data.ip,
+ (long long)data.period);
+
+ thread = perf_session__findnew(session, data.pid);
+ if (thread == NULL) {
+ pr_debug("problem processing %d event, skipping it.\n",
+ event->header.type);
+ return -1;
+ }
+
+ dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
+
+ if (profile_cpu != -1 && profile_cpu != (int)data.cpu)
+ return 0;
+
+ process_raw_event(event, session, data.raw_data, data.cpu,
+ data.time, thread);
+ return 0;
+}
+
+static int sample_type_check(struct perf_session *session)
+{
+ if (!(session->sample_type & PERF_SAMPLE_RAW)) {
+ fprintf(stderr, "No trace sample to read. Did you call "
+ "perf record without -R?");
+ return -1;
+ }
+
+ return 0;
+}
+
+static struct perf_event_ops event_ops = {
+ .process_sample_event = process_sample_event,
+ .process_comm_event = event__process_comm,
+ .sample_type_check = sample_type_check,
+};
+
+static int read_events(void)
+{
+ int err;
+
+ timer_session = perf_session__new(input_name, O_RDONLY, 0);
+ if (!timer_session)
+ return -ENOMEM;
+
+ err = perf_session__process_events(timer_session, &event_ops);
+ return err;
+}
+
+static inline bool timer_hz_unit(struct timer_entry *entry)
+{
+ return entry->type == TIMER || entry->type == VIRTUAL_ITIMER
+ || entry->type == PROF_ITIMER;
+}
+
+static void timer_adjust(struct timer_entry *entry)
+{
+ if (entry->expire_nr) {
+ entry->timer_avg_lat = (double)entry->timer_total_lat /
+ (double)entry->expire_nr;
+ if (timer_hz_unit(entry) && hz) {
+ /* change unit to nanosecond */
+ entry->timer_avg_lat = entry->timer_avg_lat * 1e9 / hz;
+ entry->timer_max_lat = entry->timer_max_lat * 1e9 / hz;
+ }
+ }
+
+ if (entry->run_func_nr)
+ entry->func_avg_runtime = (double)entry->func_total_runtime /
+ (double)entry->run_func_nr;
+}
+
+static struct rb_root timer_result;
+static void sort_result(void)
+{
+ unsigned int i;
+
+ if (!filter_type)
+ filter_type = 0xFF;
+
+ for (i = 0; i < TIMER_HASH_SIZE; i++) {
+ struct timer_entry *entry;
+
+ for (entry = *(timer_hash_table + i); entry;
+ entry = entry->next) {
+ if (!(filter_type & 1 << entry->type))
+ continue;
+
+ bug_nr += entry->bug;
+ timer_adjust(entry);
+ timer_rb_insert(&timer_result, entry, &sort);
+ }
+ }
+}
+
+static inline void
+print_timer_name(struct timer_entry *timer_entry, char *buf, int buf_len)
+{
+ enum timer_type type;
+
+ type = timer_entry->type;
+ if (type != TIMER && type != HRTIMER) {
+ snprintf(buf, buf_len, "%s",
+ ((struct thread *)timer_entry->timer)->comm);
+ return;
+ }
+
+ if (timer_entry->timer_thread)
+ snprintf(buf, buf_len, "%s", timer_entry->timer_thread->comm);
+ else
+ snprintf(buf, buf_len, "%p", timer_entry->timer);
+}
+
+static void __print_timer_lat(struct timer_entry *timer_entry)
+{
+ char buf[20];
+
+ if (!timer_entry->expire_nr)
+ return;
+
+ print_timer_name(timer_entry, buf, sizeof(buf));
+ printf("%-15s", buf);
+
+ printf("|%-15s", timer_type_str[timer_entry->type]);
+ if (timer_hz_unit(timer_entry) && !hz) {
+ printf("|%-10.3f %2s", timer_entry->timer_avg_lat, "HZ");
+ printf("|%-10llu %2s", timer_entry->timer_max_lat, "HZ");
+ } else {
+ printf("|%-13.3f", timer_entry->timer_avg_lat / 1e6);
+ printf("|%-13.3f", timer_entry->timer_max_lat / 1e6);
+ }
+
+ printf("|%-13.3f\n", timer_entry->timer_max_lat_at_ts / 1e9);
+}
+
+static void print_timer_lat(void)
+{
+ printf("\n");
+ printf(" Timer Latency List\n");
+ printf(
+"[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]\n");
+ printf(
+"-------------------------------------------------------------------------\n");
+ printf(
+" Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s\n");
+ printf(
+"-------------------------------------------------------------------------\n");
+ timer_rb_traversal(&timer_result, __print_timer_lat);
+}
+
+static void __print_func_runtime(struct timer_entry *timer_entry)
+{
+ u64 function;
+ char buf[20];
+
+ if (!timer_entry->run_func_nr)
+ return;
+
+ print_timer_name(timer_entry, buf, sizeof(buf));
+ printf("%-15s", buf);
+ printf("|%-7s", timer_type_str[timer_entry->type]);
+ printf("|%-9.3f", timer_entry->func_avg_runtime / 1e6);
+ printf("|%-9.3f", timer_entry->func_max_runtime / 1e6);
+ function = timer_entry->max_runtime_at_func;
+ if (function) {
+ struct symbol *sym;
+
+ sym = map_groups__find_function(&timer_session->kmaps,
+ timer_session, function, NULL);
+ if (sym)
+ printf("|%-23.23s", sym->name);
+ else
+ printf("|%-23llx", function);
+ } else
+ printf("|%-23s", "Not - Catch");
+
+ printf("|%-8.3f\n", timer_entry->func_max_runtime_at_ts / 1e9);
+}
+
+static void print_func_runtime(void)
+{
+ printf("\n");
+ printf(" Timer Function Runtime List\n");
+ printf(
+"[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]\n");
+
+ printf(
+"-------------------------------------------------------------------------------\n");
+ printf(
+" Timer | Type |Avg-rt ms|Max-rt ms| Max-rt-func | Max-rt-ts s\n");
+ printf(
+"-------------------------------------------------------------------------------\n");
+ timer_rb_traversal(&timer_result, __print_func_runtime);
+}
+
+static void __print_bug_timer(struct timer_entry *timer_entry)
+{
+ char buf[20];
+
+ if (!timer_entry->bug)
+ return;
+
+ print_timer_name(timer_entry, buf, sizeof(buf));
+ printf("%-15s", buf);
+
+ printf("|%-15s", timer_type_str[timer_entry->type]);
+ printf("|%-10d\n", timer_entry->bug);
+}
+
+static void print_bug_timer(void)
+{
+ if (bug_nr) {
+ printf("\nWarning: detect %d bug(s):\n", bug_nr);
+ printf("--------------------------------------------\n");
+ printf(" Timer | Type | Bug \n");
+ printf("--------------------------------------------\n");
+ timer_rb_traversal(&timer_result, __print_bug_timer);
+ }
+}
+
+static void
+__print_stats(enum stat_type stat_type, char *buf, int buf_len)
+{
+ int i;
+
+ for (i = MAX_TIMER_TYPE - 1; i >= 0; i--) {
+ int plen;
+
+ if (!(filter_type & 1 << i))
+ continue;
+ plen = snprintf(buf, buf_len, "%s: %d ", timer_type_str[i],
+ stats[stat_type][i]);
+ buf += plen;
+ buf_len -= plen;
+ }
+}
+
+static void print_stats(void)
+{
+ char buf[128];
+
+ printf("\nStatistics\n==========\n");
+ __print_stats(ACTIVATED, buf, sizeof(buf));
+ printf("Activated timers number:\n %s\n\n", buf);
+ __print_stats(EXPIRED, buf, sizeof(buf));
+ printf("Expired timers number:\n %s\n\n", buf);
+ __print_stats(CANCELED, buf, sizeof(buf));
+ printf("Canceled timers number:\n %s\n\n", buf);
+}
+
+static void print_result(void)
+{
+ if (print_lat)
+ print_timer_lat();
+
+ if (print_runtime)
+ print_func_runtime();
+
+ print_bug_timer();
+ print_stats();
+}
+
+static void __cmd_report(void)
+{
+ setup_pager();
+ read_events();
+ setup_sorting(sort_order);
+ sort_result();
+ print_result();
+ perf_session__delete(timer_session);
+}
+
+static const char *record_args[] = {
+ "record",
+ "-a",
+ "-R",
+ "-M",
+ "-f",
+ "-m", "1024",
+ "-c", "1",
+
+ /* timer */
+ "-e", "timer:timer_start",
+ "-e", "timer:timer_expire_entry",
+ "-e", "timer:timer_expire_exit",
+ "-e", "timer:timer_cancel",
+
+ /* hrtimer */
+ "-e", "timer:hrtimer_start",
+ "-e", "timer:hrtimer_expire_entry",
+ "-e", "timer:hrtimer_expire_exit",
+ "-e", "timer:hrtimer_cancel",
+
+ /* itimer **/
+ "-e", "timer:itimer_state",
+ "-e", "timer:itimer_expire",
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+ unsigned int rec_argc, i, j;
+ const char **rec_argv;
+
+ rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+ for (i = 0; i < ARRAY_SIZE(record_args); i++)
+ rec_argv[i] = strdup(record_args[i]);
+
+ for (j = 1; j < (unsigned int)argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ BUG_ON(i != rec_argc);
+
+ return cmd_record(i, rec_argv, NULL);
+}
+
+int cmd_timer(int argc, const char **argv, const char *prefix __used)
+{
+ argc = parse_options(argc, argv, timer_options, timer_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(timer_usage, timer_options);
+
+ symbol__init();
+ if (!strncmp(argv[0], "rec", 3))
+ return __cmd_record(argc, argv);
+
+ if (!strcmp(argv[0], "report")) {
+ if (argc > 1) {
+ argc = parse_options(argc, argv, report_options,
+ report_usage, 0);
+ if (argc)
+ usage_with_options(report_usage,
+ report_options);
+ }
+ __cmd_report();
+
+ } else
+ usage_with_options(timer_usage, timer_options);
+
+ return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 18035b1..e294128 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -30,5 +30,6 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_version(int argc, const char **argv, const char *prefix);
extern int cmd_probe(int argc, const char **argv, const char *prefix);
extern int cmd_kmem(int argc, const char **argv, const char *prefix);
+extern int cmd_timer(int argc, const char **argv, const char *prefix);

#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 71dc7c3..e522f5d 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -16,3 +16,4 @@ perf-top mainporcelain common
perf-trace mainporcelain common
perf-probe mainporcelain common
perf-kmem mainporcelain common
+perf-timer mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 873e55f..688d292 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -301,6 +301,7 @@ static void handle_internal_command(int argc, const char **argv)
{ "sched", cmd_sched, 0 },
{ "probe", cmd_probe, 0 },
{ "kmem", cmd_kmem, 0 },
+ { "timer", cmd_timer, 0 },
};
unsigned int i;
static const char ext[] = STRIP_EXTENSION;
--
1.6.1.2

2009-12-22 13:22:37

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format

Ah, sorry for this title error. :-(

s/[PATCH v2 2/5]:/[PATCH v2 2/5]

Xiao Guangrong wrote:
> Export HZ in timer's tracepoint, we can use it to get
> TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency and it's
> suggested by Ingo
>

2009-12-28 07:54:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format


* Xiao Guangrong <[email protected]> wrote:

> Export HZ in timer's tracepoint, we can use it to get
> TIMER/ITIMER_VIRTUAL/ITIMER_PROF exact latency and it's
> suggested by Ingo
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> include/trace/events/timer.h | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
> index 13ec15a..7749ae5 100644
> --- a/include/trace/events/timer.h
> +++ b/include/trace/events/timer.h
> @@ -74,14 +74,17 @@ TRACE_EVENT(timer_expire_entry,
> TP_STRUCT__entry(
> __field( void *, timer )
> __field( unsigned long, now )
> + __field( int, hz )
> ),
>
> TP_fast_assign(
> __entry->timer = timer;
> __entry->now = jiffies;
> + __entry->hz = HZ;
> ),
>
> - TP_printk("timer=%p now=%lu", __entry->timer, __entry->now)
> + TP_printk("timer=%p now=%lu HZ=%d", __entry->timer, __entry->now,
> + __entry->hz)
> );

I think we can do something slightly different and more efficient: just create
a new timer event to report the value of HZ.

That way we dont clutter the timer_expire_entry record format with a
repetitive HZ field. It's an extra 4 bytes overhead: that has to be written,
passed along, copied and thrown away in 99.9999% of the cases - such overhead
should be avoided.

If you created a special timer_params event, which would produce precisely one
event when triggered via say a new perf ioctl. I.e. add something like this to
perf_event.h:

#define PERF_EVENT_IOC_INJECT _IOW('$', 7, __u64)

and add code to kernel/perf_event.c's perf_ioctl() function that takes that
__u64 parameter as an event ID and injects an 'artificial' event.

Such a new feature would be useful for other things as well: backtesting rare
events, injecting other types of 'parameter/query events', etc.

There might be more details to this, but it would be a useful scheme IMO - and
it would still integrate nicely with the whole ftrace event enumeration scheme
so tooling support would be easier.

What do you think?

Ingo

2009-12-28 10:42:11

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 2/5]: trace_event: export HZ in timer's tracepoint format



Ingo Molnar wrote:

>
> I think we can do something slightly different and more efficient: just create
> a new timer event to report the value of HZ.
>
> That way we dont clutter the timer_expire_entry record format with a
> repetitive HZ field. It's an extra 4 bytes overhead: that has to be written,
> passed along, copied and thrown away in 99.9999% of the cases - such overhead
> should be avoided.
>
> If you created a special timer_params event, which would produce precisely one
> event when triggered via say a new perf ioctl. I.e. add something like this to
> perf_event.h:
>
> #define PERF_EVENT_IOC_INJECT _IOW('$', 7, __u64)
>
> and add code to kernel/perf_event.c's perf_ioctl() function that takes that
> __u64 parameter as an event ID and injects an 'artificial' event.
>
> Such a new feature would be useful for other things as well: backtesting rare
> events, injecting other types of 'parameter/query events', etc.
>
> There might be more details to this, but it would be a useful scheme IMO - and
> it would still integrate nicely with the whole ftrace event enumeration scheme
> so tooling support would be easier.
>
> What do you think?
>

Sure, this is the better way, i'll fix it address your suggestion in the next version.

Thanks,
Xiao

2009-12-29 05:22:46

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 0/5] perf tools: introduce 'perf timer' to analyze timer's behavior

Changelog v2->v3:

introduce 'inject' event to get kernel HZ address Ingo's suggestion
[ see http://lkml.org/lkml/2009/12/28/31 ]

2009-12-29 05:22:59

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ

'inject' event is a very useful feature and it's suggested by Ingo
[ See http://lkml.org/lkml/2009/12/28/31 ]

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/linux/perf_event.h | 13 +++++++++++
kernel/perf_event.c | 47 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin-record.c | 13 +++++++++++
tools/perf/util/session.c | 5 ++++
tools/perf/util/session.h | 3 +-
5 files changed, 80 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 9a1d276..6c93f88 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -228,6 +228,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
+#define PERF_EVENT_IOC_INJECT _IO ('$', 7)

enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
@@ -413,10 +414,22 @@ enum perf_event_type {
* };
*/
PERF_RECORD_SAMPLE = 9,
+ /*
+ * struct {
+ * struct perf_event_header header;
+ * u32 inject_event_id;
+ * u64 value;
+ * };
+ */
+ PERF_RECORD_INJECT = 10,

PERF_RECORD_MAX, /* non-ABI */
};

+enum perf_inject_event {
+ PERF_INJECT_HZ = 0x01,
+};
+
enum perf_callchain_context {
PERF_CONTEXT_HV = (__u64)-32,
PERF_CONTEXT_KERNEL = (__u64)-128,
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 8984afd..9343c6c 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -2012,6 +2012,50 @@ unlock:
return ret;
}

+static int perf_inject_get_hz(u64 *hz)
+{
+ *hz = HZ;
+ return 0;
+}
+
+static int perf_inject_event(struct perf_event *event, u32 inject_event_id,
+ int (*get_value)(u64 *))
+{
+ struct perf_output_handle handle;
+ struct perf_inject_event {
+ struct perf_event_header header;
+ u32 inject_event_id;
+ u64 value;
+ } inject_event;
+ int ret = 0;
+
+ inject_event.header.type = PERF_RECORD_INJECT;
+ inject_event.header.misc = 0;
+ inject_event.header.size = sizeof(inject_event);
+ inject_event.inject_event_id = inject_event_id;
+
+ ret = get_value(&inject_event.value);
+ if (ret)
+ goto exit;
+
+ ret = perf_output_begin(&handle, event, inject_event.header.size, 0, 0);
+ if (ret)
+ goto exit;
+
+ perf_output_put(&handle, inject_event);
+ perf_output_end(&handle);
+exit:
+ return ret;
+}
+
+static int perf_inject_ioctl(struct perf_event *event, unsigned int arg)
+{
+ if (!arg || arg & ~PERF_INJECT_HZ)
+ return -EINVAL;
+
+ return perf_inject_event(event, PERF_INJECT_HZ, perf_inject_get_hz);
+}
+
static int perf_event_set_output(struct perf_event *event, int output_fd);
static int perf_event_set_filter(struct perf_event *event, void __user *arg);

@@ -2044,6 +2088,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PERF_EVENT_IOC_SET_FILTER:
return perf_event_set_filter(event, (void __user *)arg);

+ case PERF_EVENT_IOC_INJECT:
+ return perf_inject_ioctl(event, arg);
+
default:
return -ENOTTY;
}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2654253..d13601d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -65,6 +65,8 @@ static int file_new = 1;

static struct perf_session *session;

+u32 inject_events;
+
struct mmap_data {
int counter;
void *base;
@@ -381,6 +383,17 @@ try_again:
}
}

+ if (inject_events) {
+ ret = ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_INJECT,
+ inject_events);
+ if (ret) {
+ error("failed to inject event(%u) with %d (%s)\n",
+ inject_events, errno, strerror(errno));
+ exit(-1);
+ }
+ inject_events = 0;
+ }
+
ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_ENABLE);
}

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7f0537d..74f43af 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -179,6 +179,8 @@ static void perf_event_ops__fill_defaults(struct perf_event_ops *handler)
handler->throttle = process_event_stub;
if (handler->unthrottle == NULL)
handler->unthrottle = process_event_stub;
+ if (handler->inject == NULL)
+ handler->inject = process_event_stub;
}

static const char *event__name[] = {
@@ -192,6 +194,7 @@ static const char *event__name[] = {
[PERF_RECORD_FORK] = "FORK",
[PERF_RECORD_READ] = "READ",
[PERF_RECORD_SAMPLE] = "SAMPLE",
+ [PERF_RECORD_INJECT] = "INJECT",
};

unsigned long event__total[PERF_RECORD_MAX];
@@ -239,6 +242,8 @@ static int perf_session__process_event(struct perf_session *self,
return ops->throttle(event, self);
case PERF_RECORD_UNTHROTTLE:
return ops->unthrottle(event, self);
+ case PERF_RECORD_INJECT:
+ return ops->inject(event, self);
default:
self->unknown_events++;
return -1;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 77c5ee2..8742354 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -40,7 +40,8 @@ struct perf_event_ops {
lost,
read,
throttle,
- unthrottle;
+ unthrottle,
+ inject;
};

struct perf_session *perf_session__new(const char *filename, int mode, bool force);
--
1.6.1.2

2009-12-29 05:23:16

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 2/5] trace_event: record task' real_timer in itimer_state tracepoint

When itimer is ITIMER_REAL, we should cooperate with hrtimer's
tracepoit to get it's latency, this patch can let us get itimer's
hrtimer easy

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/trace/events/timer.h | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 9496b96..13ec15a 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -285,6 +285,7 @@ TRACE_EVENT(itimer_state,

TP_STRUCT__entry(
__field( int, which )
+ __field( void *, timer )
__field( cputime_t, expires )
__field( long, value_sec )
__field( long, value_usec )
@@ -294,6 +295,8 @@ TRACE_EVENT(itimer_state,

TP_fast_assign(
__entry->which = which;
+ __entry->timer = which == ITIMER_REAL ?
+ &current->signal->real_timer : NULL;
__entry->expires = expires;
__entry->value_sec = value->it_value.tv_sec;
__entry->value_usec = value->it_value.tv_usec;
@@ -301,8 +304,9 @@ TRACE_EVENT(itimer_state,
__entry->interval_usec = value->it_interval.tv_usec;
),

- TP_printk("which=%d expires=%llu it_value=%ld.%ld it_interval=%ld.%ld",
- __entry->which, (unsigned long long)__entry->expires,
+ TP_printk("which=%d timer=%p expires=%llu it_value=%ld.%ld it_interval="
+ "%ld.%ld", __entry->which, __entry->timer,
+ (unsigned long long)__entry->expires,
__entry->value_sec, __entry->value_usec,
__entry->interval_sec, __entry->interval_usec)
);
--
1.6.1.2

2009-12-29 05:23:31

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 3/5] perf tools: fix getting point

The point that got by raw_field_ptr() is point to @data offset not
tracepoint saved.

Introduce raw_field_array() to get array data

Changlog v1->v2:
- fix the typo 'arry' -> 'array' that is pointed out by Frederic Weisbecker

- use '(void *)(unsigned long)value' instead of 'memcpy()' as
Frederic Weisbecker's suggestion

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/builtin-sched.c | 3 +--
tools/perf/util/trace-event-parse.c | 23 ++++++++++++++++++++---
tools/perf/util/trace-event.h | 3 +++
3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 702322f..9b91836 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -630,8 +630,7 @@ static void test_calibrations(void)

#define FILL_ARRAY(ptr, array, event, data) \
do { \
- void *__array = raw_field_ptr(event, #array, data); \
- memcpy(ptr.array, __array, sizeof(ptr.array)); \
+ raw_field_array(event, #array, data, ptr.array, sizeof(ptr.array));\
} while(0)

#define FILL_COMMON_FIELDS(ptr, event, data) \
diff --git a/tools/perf/util/trace-event-parse.c b/tools/perf/util/trace-event-parse.c
index c5c32be..037bbe3 100644
--- a/tools/perf/util/trace-event-parse.c
+++ b/tools/perf/util/trace-event-parse.c
@@ -840,7 +840,7 @@ static int event_read_fields(struct event *event, struct format_field **fields)
field->flags |= FIELD_IS_ARRAY;

type = read_token(&token);
- while (strcmp(token, "]") != 0) {
+ while (strcmp(token, "]") != 0) {
if (last_type == EVENT_ITEM &&
type == EVENT_ITEM)
len = 2;
@@ -1919,13 +1919,30 @@ raw_field_value(struct event *event, const char *name, void *data)

void *raw_field_ptr(struct event *event, const char *name, void *data)
{
+ unsigned long long value;
+
+ value = raw_field_value(event, name, data);
+
+ if (!value)
+ return NULL;
+
+ return (void *)(unsigned long)value;
+}
+
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+ void *array, int array_size)
+{
+ int len;
struct format_field *field;

field = find_any_field(event, name);
if (!field)
- return NULL;
+ return 0ULL;

- return data + field->offset;
+ len = min(array_size, field->size);
+ memcpy(array, data + field->offset, len);
+ return len;
}

static int get_common_info(const char *type, int *offset, int *size)
diff --git a/tools/perf/util/trace-event.h b/tools/perf/util/trace-event.h
index 6ad4056..c3b5c39 100644
--- a/tools/perf/util/trace-event.h
+++ b/tools/perf/util/trace-event.h
@@ -255,6 +255,9 @@ unsigned long long read_size(void *ptr, int size);
unsigned long long
raw_field_value(struct event *event, const char *name, void *data);
void *raw_field_ptr(struct event *event, const char *name, void *data);
+unsigned long long
+raw_field_array(struct event *event, const char *name, void *data,
+ void *array, int array_size);
unsigned long long eval_flag(const char *flag);

int read_tracing_data(int fd, struct perf_event_attr *pattrs, int nb_events);
--
1.6.1.2

2009-12-29 05:23:45

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 4/5] perf timer: add document for 'perf timer'

Add document for 'perf timer', it let people know how to use it

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/Documentation/perf-timer.txt | 50 +++++++++++++++++++++++++++++++
1 files changed, 50 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/Documentation/perf-timer.txt

diff --git a/tools/perf/Documentation/perf-timer.txt b/tools/perf/Documentation/perf-timer.txt
new file mode 100644
index 0000000..5bd26d4
--- /dev/null
+++ b/tools/perf/Documentation/perf-timer.txt
@@ -0,0 +1,50 @@
+perf-timer(1)
+==============
+
+NAME
+----
+perf-timer - Tool to trace/measure timer properties
+
+SYNOPSIS
+--------
+[verse]
+'perf timer' {record|report} [<options>]
+
+DESCRIPTION
+-----------
+There are two variants of perf timer:
+
+ 'perf timer record <command>' to record the timer events
+ of an arbitrary workload.
+
+ 'perf timer report' to report number of activated/expired/canceled
+ timers, timer latency and timer function runtime.
+
+OPTIONS
+-------
+-D::
+--dump-raw-trace=::
+ Display verbose dump of the sched data.
+
+-i <file>::
+--input=<file>::
+ Select the input file (default: perf.data)
+
+-s <key[,key2...]>::
+--sort=<key[,key2...]>::
+ Sort the output (default: type, avg-lat, max-lat,
+ avg-runtime, max-runtime)
+
+--print-lat::
+ Print timer latency
+
+--print-runtime::
+ Print timer function runtime
+
+--type::
+ Timer type selector, you can specify one or more types
+ (timer, hrtimer, itimer-real, itimer-virtual, itimer-prof)
+
+SEE ALSO
+--------
+linkperf:perf-record[1]
--
1.6.1.2

2009-12-29 05:24:13

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 5/5] perf timer: add 'perf timer' core code

It's the core code of 'perf timer', it can report number of
activated/expired/canceled timers, timer latency and timer function
runtime

Sample output:
- Record timer event:

#perf timer record
^C

- Show timer latency:
#perf timer report --print-lat
Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper |HRTIMER |0.180 |2.404 |42722.586
swapper |TIMER |2.000 |2.000 |42722.586
events/0 |TIMER |2.000 |2.000 |42723.002
events/0 |TIMER |0.000 |0.000 |42723.641
sshd |TIMER |0.000 |0.000 |42723.641

Statistics
==========
Activated timers number:
HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Expired timers number:
HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Canceled timers number:
HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

- Show timer function runtime:
#perf timer report --print-runtime
Timer Function Runtime List
[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]
-------------------------------------------------------------------------------
Timer | Type |Avg-rt ms|Max-rt ms| Max-rt-func | Max-rt-ts s
-------------------------------------------------------------------------------
swapper |HRTIMER|0.010 |0.026 |tick_sched_timer |42722.586
0xc15f2b70 |HRTIMER|0.009 |0.009 |Not - Catch |42722.000
swapper |TIMER |0.058 |0.111 |tcp_write_timer |42721.930
events/0 |TIMER |0.007 |0.010 |delayed_work_timer_fn |42723.002
swapper |TIMER |0.037 |0.037 |dev_watchdog |42723.002
events/0 |TIMER |0.028 |0.028 |delayed_work_timer_fn |42723.641
swapper |TIMER |0.023 |0.023 |pcnet32_watchdog |42722.001
events/1 |TIMER |0.014 |0.014 |Not - Catch |42723.003
events/0 |TIMER |0.010 |0.010 |Not - Catch |42722.001
sshd |TIMER |0.009 |0.009 |delayed_work_timer_fn |42723.641
swapper |TIMER |0.004 |0.004 |Not - Catch |42721.638
events/1 |TIMER |0.003 |0.003 |Not - Catch |42723.003

Statistics
==========
Activated timers number:
HRTIMER: 105 TIMER: 11 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Expired timers number:
HRTIMER: 94 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

Canceled timers number:
HRTIMER: 105 TIMER: 12 ITIMER_PROF: 0 ITIMER_VIRTUAL: 0 ITIMER_REAL: 0

- Specify types which you are interesting and choose one or more of
"timer, hrtimer, itimer-real, itimer-virtual, itimer-prof"

#perf timer report --print-lat --type timer

Timer Latency List
[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]
-------------------------------------------------------------------------
Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s
-------------------------------------------------------------------------
swapper |TIMER |2.000 |2.000 |42722.586
events/0 |TIMER |2.000 |2.000 |42723.002
events/0 |TIMER |0.000 |0.000 |42723.641
sshd |TIMER |0.000 |0.000 |42723.641

Changlog v1->v2:
- rename 'perf timer latency' to 'perf timer report',
because it not only show timer latency but also show timer
function runtime, timer activated/canceled/expired statistics
and canceled timer list(TODO)

Below changes are all from Thomas's suggestion:
- fix a lot of typos and bad ideas

- use hash table instead of RB tree to record timer

- let output information more readable

- report number of activated/canceled/expired timers

- support to filter timer types:
'--type' can do it

TODO:
- Show canceled timer list that is suggested by Thomas

- Any suggestion are welcome.

Signed-off-by: Xiao Guangrong <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-timer.c | 996 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
tools/perf/util/event.h | 7 +
6 files changed, 1007 insertions(+), 0 deletions(-)
create mode 100644 tools/perf/builtin-timer.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7c84642..b5bbcea 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -449,6 +449,7 @@ BUILTIN_OBJS += builtin-top.o
BUILTIN_OBJS += builtin-trace.o
BUILTIN_OBJS += builtin-probe.o
BUILTIN_OBJS += builtin-kmem.o
+BUILTIN_OBJS += builtin-timer.o

PERFLIBS = $(LIB_FILE)

diff --git a/tools/perf/builtin-timer.c b/tools/perf/builtin-timer.c
new file mode 100644
index 0000000..17db588
--- /dev/null
+++ b/tools/perf/builtin-timer.c
@@ -0,0 +1,996 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/debug.h"
+#include "util/session.h"
+
+#include <sys/prctl.h>
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+#include "../../include/linux/hash.h"
+
+#define SORT_KEY "type, avg-lat, max-lat, avg-runtime, max-runtime"
+
+extern u32 inject_events;
+
+static char const *input_name = "perf.data";
+static const char *sort_order = SORT_KEY;
+static int print_lat, print_runtime, profile_cpu = -1;
+static struct perf_session *timer_session;
+static int bug_nr;
+
+enum timer_type {
+ REAL_ITIMER,
+ VIRTUAL_ITIMER,
+ PROF_ITIMER,
+ TIMER,
+ HRTIMER,
+
+ MAX_TIMER_TYPE,
+};
+
+static char const *timer_type_str[] = {
+ "ITIMER_REAL",
+ "ITIMER_VIRTUAL",
+ "ITIMER_PROF",
+ "TIMER",
+ "HRTIMER",
+};
+
+enum stat_type {
+ ACTIVATED,
+ EXPIRED,
+ CANCELED,
+
+ MAX_STATS,
+};
+
+static u32 stats[MAX_STATS][MAX_TIMER_TYPE];
+static inline void
+stats_update(enum timer_type timer_type, enum stat_type stat_type)
+{
+ stats[stat_type][timer_type]++;
+}
+
+static u32 hz;
+static inline void update_hz(u64 *nhz)
+{
+ hz = *nhz;
+}
+
+static const struct option timer_options[] = {
+ OPT_STRING('i', "input", &input_name, "file",
+ "input file name"),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show symbol address, etc)"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_END()
+};
+
+static const char *const timer_usage[] = {
+ "perf timer [<options>] {record|report}",
+ NULL
+};
+
+static int filter_type;
+static int parse_type_opt(const struct option *opt __used,
+ const char *arg, int unset __used)
+{
+ char *tok, *str = strdup(arg);
+ int ret = 0;
+
+ if (!str)
+ die("strdup");
+
+ while ((tok = strsep(&str, ",")) != NULL) {
+
+ if (!strcmp(tok, "timer"))
+ filter_type |= 1 << TIMER;
+ else if (!strcmp(tok, "hrtimer"))
+ filter_type |= 1 << HRTIMER;
+ else if (!strcmp(tok, "itimer-real"))
+ filter_type |= 1 << REAL_ITIMER;
+ else if (!strcmp(tok, "itimer-virtual"))
+ filter_type |= 1 << VIRTUAL_ITIMER;
+ else if (!strcmp(tok, "itimer-prof"))
+ filter_type |= 1 << PROF_ITIMER;
+ else {
+ printf("Unknow --type: %s\n", tok);
+ ret = -1;
+ break;
+ }
+ }
+
+ free(str);
+ return ret;
+}
+
+static const struct option report_options[] = {
+ OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+ "sort by key(s): "SORT_KEY),
+ OPT_BOOLEAN('v', "verbose", &verbose,
+ "be more verbose (show symbol address, etc)"),
+ OPT_INTEGER('C', "CPU", &profile_cpu,
+ "CPU to profile on"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
+ OPT_BOOLEAN(0, "print-lat", &print_lat, "show timer latency"),
+ OPT_BOOLEAN(0, "print-runtime", &print_runtime,
+ "show timer function runtime"),
+ OPT_CALLBACK(0, "type", NULL, "timer types",
+ "timer type selector, specify one or more types(timer, "
+ "hrtimer, itimer-real, itimer-virtual, itimer-prof)",
+ parse_type_opt),
+ OPT_END()
+};
+
+static const char *const report_usage[] = {
+ "perf timer report [<options>]",
+ NULL
+};
+
+struct timer_entry {
+ struct timer_entry *next;
+ struct rb_node node;
+
+ void *timer;
+ enum timer_type type;
+ u64 function;
+ struct thread *timer_thread;
+ int expire_nr;
+ int run_func_nr;
+ int bug;
+
+ struct timer_entry *itimer_hrtimer;
+
+ u64 timer_last_lat;
+ u64 timer_expire_ts;
+ u64 timer_total_lat;
+ double timer_avg_lat;
+ s64 timer_max_lat;
+ u64 timer_max_lat_at_ts;
+
+ u64 func_entry_ts;
+ u64 func_total_runtime;
+ double func_avg_runtime;
+ s64 func_max_runtime;
+ u64 func_max_runtime_at_ts;
+ u64 max_runtime_at_func;
+};
+
+typedef int (*sort_func)(struct timer_entry *, struct timer_entry *);
+struct sort_dimension {
+ const char *name;
+ sort_func cmp;
+ struct list_head list;
+};
+
+#define DEFINE_CMP_FUNCTION(field) \
+static int cmp_##field(struct timer_entry *v1, struct timer_entry *v2) \
+{ \
+ if (v1->field < v2->field) \
+ return -1; \
+ if (v1->field > v2->field) \
+ return 1; \
+ return 0; \
+}
+
+DEFINE_CMP_FUNCTION(type);
+DEFINE_CMP_FUNCTION(timer_avg_lat);
+DEFINE_CMP_FUNCTION(timer_max_lat);
+DEFINE_CMP_FUNCTION(func_avg_runtime);
+DEFINE_CMP_FUNCTION(func_max_runtime);
+
+#define SORT(field, _name) \
+{ .cmp = cmp_##field, \
+ .name = _name, \
+}
+
+static struct sort_dimension sorts[] = {
+ SORT(type, "type"),
+ SORT(timer_avg_lat, "avg-lat"),
+ SORT(timer_max_lat, "max-lat"),
+ SORT(func_avg_runtime, "avg-runtime"),
+ SORT(func_max_runtime, "max-runtime"),
+};
+
+static int sort_dimension_add(const char *tok, struct list_head *list)
+{
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(sorts); i++) {
+ if (!strcmp(sorts[i].name, tok)) {
+ list_add_tail(&sorts[i].list, list);
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static LIST_HEAD(sort);
+static void setup_sorting(char const *sort_string)
+{
+ char *tmp, *tok, *str = strdup(sort_string);
+
+ for (tok = strtok_r(str, ",", &tmp);
+ tok; tok = strtok_r(NULL, ", ", &tmp)) {
+ if (sort_dimension_add(tok, &sort) < 0) {
+ error("Unknown --sort key: `%s'", tok);
+ usage_with_options(report_usage, report_options);
+ }
+ }
+
+ free(str);
+}
+
+static int timer_key_cmp(struct list_head *list, struct timer_entry *l,
+ struct timer_entry *r)
+{
+ struct sort_dimension *sortp;
+ int ret = 0;
+
+ BUG_ON(list_empty(list));
+
+ list_for_each_entry(sortp, list, list) {
+ ret = sortp->cmp(l, r);
+ if (ret)
+ return ret;
+ }
+
+ return ret;
+}
+
+static void timer_rb_insert(struct rb_root *root, struct timer_entry *timer,
+ struct list_head *sort_list)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ while (*new) {
+ struct timer_entry *this;
+ int cmp;
+
+ this = container_of(*new, struct timer_entry, node);
+ parent = *new;
+
+ cmp = timer_key_cmp(sort_list, timer, this);
+
+ if (cmp > 0)
+ new = &((*new)->rb_left);
+ else
+ new = &((*new)->rb_right);
+ }
+
+ rb_link_node(&timer->node, parent, new);
+ rb_insert_color(&timer->node, root);
+}
+
+static void timer_rb_traversal(struct rb_root *root,
+ void (*func)(struct timer_entry *))
+{
+ struct rb_node *next;
+
+ next = rb_first(root);
+
+ while (next) {
+ struct timer_entry *timer;
+
+ timer = rb_entry(next, struct timer_entry, node);
+ func(timer);
+ next = rb_next(next);
+ }
+}
+
+#define TIMER_HASH_BITS 10
+#define TIMER_HASH_SIZE (1UL << TIMER_HASH_BITS)
+
+static struct timer_entry *timer_hash_table[TIMER_HASH_SIZE];
+
+static inline int timer_hash_func(void *timer)
+{
+ return hash_ptr(timer, TIMER_HASH_BITS);
+}
+
+static struct timer_entry *
+timer_hash_findnew(void *timer, enum timer_type type)
+{
+ int hash = timer_hash_func(timer);
+ struct timer_entry **head, *curr, *prev = NULL;
+
+ head = timer_hash_table + hash;
+ curr = *head;
+
+ while (curr) {
+ if (curr->timer == timer && curr->type == type)
+ return curr;
+
+ prev = curr;
+ curr = curr->next;
+ }
+
+ curr = zalloc(sizeof(*curr));
+ if (!curr)
+ die("No memory");
+
+ curr->timer = timer;
+ curr->type = type;
+
+ if (prev)
+ prev->next = curr;
+ else
+ *head = curr;
+
+ return curr;
+}
+
+static inline void *
+get_timer_hrtimer(enum timer_type type, struct event *event, void *data)
+{
+ if (type == HRTIMER)
+ return raw_field_ptr(event, "hrtimer", data);
+
+ return raw_field_ptr(event, "timer", data);
+}
+
+/*
+ * if expiry time is not recorded, it means we not caught the start event,
+ * we can't calculate timer latency
+ */
+static inline bool check_timer_expire_ts(struct timer_entry *timer_entry)
+{
+ return !!timer_entry->timer_expire_ts;
+}
+
+/*
+ * if function entry time is not recorded, it means we not caught the
+ * expire entry event, we can't calculate function runtime
+ */
+static inline bool check_timer_func_entry_ts(struct timer_entry *timer_entry)
+{
+ return !!timer_entry->func_entry_ts;
+}
+
+static void timer_hrtimer_start(enum timer_type type, void *data,
+ struct event *event, struct thread *thread)
+{
+ void *timer;
+ struct timer_entry *timer_entry;
+ u64 expires, function;
+
+ timer = get_timer_hrtimer(type, event, data);
+ function = raw_field_value(event, "function", data);
+ expires = raw_field_value(event, "expires", data);
+
+ stats_update(type, ACTIVATED);
+ timer_entry = timer_hash_findnew(timer, type);
+
+ if (!timer_entry->timer_thread)
+ timer_entry->timer_thread = thread;
+
+ timer_entry->function = function;
+ timer_entry->timer_expire_ts = expires;
+}
+
+static void timer_hrtimer_expires_entry(enum timer_type type, void *data,
+ struct event *event, u64 timestamp)
+{
+ void *timer;
+ struct timer_entry *timer_entry;
+ u64 now;
+ s64 delta;
+
+ timer = get_timer_hrtimer(type, event, data);
+ now = raw_field_value(event, "now", data);
+
+ stats_update(type, EXPIRED);
+ timer_entry = timer_hash_findnew(timer, type);
+ timer_entry->func_entry_ts = timestamp;
+
+ if (!check_timer_expire_ts(timer_entry))
+ return;
+
+ delta = now - timer_entry->timer_expire_ts;
+ if (delta < 0) {
+ timer_entry->bug++;
+ printf("WARNING: timer[%p] type[%s]: latency < 0, expires"
+ "[%llu] now[%llu]\n", timer_entry->timer,
+ timer_type_str[timer_entry->type],
+ timer_entry->timer_expire_ts, now);
+ return ;
+ }
+
+ timer_entry->expire_nr++;
+ timer_entry->timer_last_lat = delta;
+ timer_entry->timer_total_lat += delta;
+ if (timer_entry->timer_max_lat <= delta) {
+ timer_entry->timer_max_lat = delta;
+ timer_entry->timer_max_lat_at_ts = timestamp;
+ }
+ return;
+}
+
+static void timer_hrtimer_expires_exit(enum timer_type type, void *data,
+ struct event *event, u64 timestamp)
+{
+ void *timer;
+ struct timer_entry *timer_entry;
+ s64 delta;
+
+ timer = get_timer_hrtimer(type, event, data);
+ timer_entry = timer_hash_findnew(timer, type);
+
+ if (!check_timer_func_entry_ts(timer_entry))
+ return;
+
+ delta = timestamp - timer_entry->func_entry_ts;
+ if (delta < 0) {
+ timer_entry->bug++;
+ printf("WARNING: timer[%p] type[%s]: runtime < 0, "
+ "func_entry_ts[%llu] now[%llu]\n", timer_entry->timer,
+ timer_type_str[timer_entry->type],
+ timer_entry->func_entry_ts, timestamp);
+ return;
+ }
+
+ timer_entry->run_func_nr++;
+ timer_entry->func_total_runtime += delta;
+ if (timer_entry->func_max_runtime <= delta) {
+ timer_entry->func_max_runtime = delta;
+ timer_entry->func_max_runtime_at_ts = timestamp;
+ timer_entry->max_runtime_at_func = timer_entry->function;
+ }
+
+ return;
+}
+
+static void timer_hrtimer_cancel(enum timer_type type)
+{
+ stats_update(type, CANCELED);
+}
+
+static void timer_start_handler(void *data, struct event *event,
+ u64 timestamp __used, struct thread *thread)
+{
+ return timer_hrtimer_start(TIMER, data, event, thread);
+}
+
+static void
+timer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_entry(TIMER, data, event, timestamp);
+}
+
+static void
+timer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_exit(TIMER, data, event, timestamp);
+}
+
+static void timer_cancel(void *data __used, struct event *event __used,
+ u64 timestamp __used, struct thread *thread __used)
+{
+ return timer_hrtimer_cancel(TIMER);
+}
+
+static void hrtimer_start_handler(void *data, struct event *event,
+ u64 timestamp __used, struct thread *thread)
+{
+ return timer_hrtimer_start(HRTIMER, data, event, thread);
+}
+
+static void
+hrtimer_expire_entry_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_entry(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_expire_exit_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ return timer_hrtimer_expires_exit(HRTIMER, data, event, timestamp);
+}
+
+static void
+hrtimer_cancel(void *data __used, struct event *event __used,
+ u64 timestamp __used, struct thread *thread __used)
+{
+ return timer_hrtimer_cancel(HRTIMER);
+}
+
+static void itimer_state_handler(void *data, struct event *event,
+ u64 timestamp __used, struct thread *thread)
+{
+ struct timer_entry *timer_entry;
+ u64 value_sec, value_usec, expires;
+ int which;
+
+ value_sec = raw_field_value(event, "value_sec", data);
+ value_usec = raw_field_value(event, "value_usec", data);
+ expires = raw_field_value(event, "expires", data);
+ which = raw_field_value(event, "which", data);
+
+ timer_entry = timer_hash_findnew(thread, which);
+
+ /* itimer canceled */
+ if (!value_sec && !value_usec) {
+ stats_update(which, CANCELED);
+ return ;
+ }
+
+ /* itimer started */
+ stats_update(which, ACTIVATED);
+
+ if (which == ITIMER_REAL) {
+ void *hrtimer;
+
+ hrtimer = raw_field_ptr(event, "timer", data);
+ timer_entry->itimer_hrtimer = timer_hash_findnew(hrtimer,
+ HRTIMER);
+ } else
+ timer_entry->timer_expire_ts = expires;
+}
+
+static void
+itimer_expire_handler(void *data, struct event *event, u64 timestamp,
+ struct thread *thread __used)
+{
+ struct thread *itimer_thread;
+ struct timer_entry *timer_entry;
+ u64 now;
+ s64 delta;
+ pid_t pid;
+ int which;
+
+ which = raw_field_value(event, "which", data);
+ now = raw_field_value(event, "now", data);
+ pid = raw_field_value(event, "pid", data);
+
+ itimer_thread = perf_session__findnew(timer_session, pid);
+ timer_entry = timer_hash_findnew(itimer_thread, which);
+
+ stats_update(which, EXPIRED);
+
+ if (which == ITIMER_REAL) {
+ if (!timer_entry->itimer_hrtimer ||
+ !check_timer_expire_ts(timer_entry->itimer_hrtimer))
+ return;
+ delta = timer_entry->itimer_hrtimer->timer_last_lat;
+ } else {
+ if (!check_timer_expire_ts(timer_entry))
+ return;
+ delta = now - timer_entry->timer_expire_ts;
+ }
+
+ if (delta < 0) {
+ timer_entry->bug++;
+ printf("WARNING: timer[%p] type[%s]: latency < 0, expires:"
+ "[%llu] now[%llu]\n", timer_entry->timer,
+ timer_type_str[timer_entry->type],
+ timer_entry->timer_expire_ts, now);
+ return;
+ }
+
+ timer_entry->expire_nr++;
+ timer_entry->timer_total_lat += delta;
+ if (timer_entry->timer_max_lat <= delta) {
+ timer_entry->timer_max_lat = delta;
+ timer_entry->timer_max_lat_at_ts = timestamp;
+ }
+
+ return;
+}
+
+struct event_handler {
+ const char *event_name;
+ void (*handler) (void *data, struct event *event, u64 timestamp,
+ struct thread *thread);
+} timer_handler[] = {
+ /* timer */
+ { "timer_start", timer_start_handler },
+ { "timer_expire_entry", timer_expire_entry_handler },
+ { "timer_expire_exit", timer_expire_exit_handler },
+ { "timer_cancel", timer_cancel },
+
+ /* hrtimer */
+ { "hrtimer_start", hrtimer_start_handler },
+ { "hrtimer_expire_entry", hrtimer_expire_entry_handler },
+ { "hrtimer_expire_exit", hrtimer_expire_exit_handler },
+ { "hrtimer_cancel", hrtimer_cancel },
+
+ /* itimer */
+ { "itimer_state", itimer_state_handler },
+ { "itimer_expire", itimer_expire_handler },
+};
+
+static void process_raw_event(event_t *raw_event __used,
+ struct perf_session *session __used, void *data,
+ int cpu __used, u64 timestamp,
+ struct thread *thread)
+{
+ struct event *event;
+ unsigned int i;
+ int type;
+
+ type = trace_parse_common_type(data);
+ event = trace_find_event(type);
+
+ for (i = 0; i < ARRAY_SIZE(timer_handler); i++)
+ if (!strcmp(timer_handler[i].event_name, event->name))
+ timer_handler[i].handler(data, event,
+ timestamp, thread);
+}
+
+static int process_sample_event(event_t *event, struct perf_session *session)
+{
+ struct sample_data data;
+ struct thread *thread;
+
+ if (!(session->sample_type & PERF_SAMPLE_RAW))
+ return 0;
+
+ memset(&data, 0, sizeof(data));
+ data.time = -1;
+ data.cpu = -1;
+ data.period = -1;
+
+ event__parse_sample(event, session->sample_type, &data);
+
+ dump_printf("(IP, %d): %d/%d: %p period: %lld\n",
+ event->header.misc,
+ data.pid, data.tid,
+ (void *)(long)data.ip,
+ (long long)data.period);
+
+ thread = perf_session__findnew(session, data.pid);
+ if (thread == NULL) {
+ pr_debug("problem processing %d event, skipping it.\n",
+ event->header.type);
+ return -1;
+ }
+
+ dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
+
+ if (profile_cpu != -1 && profile_cpu != (int)data.cpu)
+ return 0;
+
+ process_raw_event(event, session, data.raw_data, data.cpu,
+ data.time, thread);
+ return 0;
+}
+
+static int
+process_inject_event(event_t *event, struct perf_session *session __used)
+{
+ if (event->inject.inject_event_id == PERF_INJECT_HZ)
+ update_hz(&event->inject.value);
+
+ return 0;
+}
+
+static struct perf_event_ops event_ops = {
+ .sample = process_sample_event,
+ .comm = event__process_comm,
+ .inject = process_inject_event,
+};
+
+static int read_events(void)
+{
+ int err;
+
+ timer_session = perf_session__new(input_name, O_RDONLY, 0);
+ if (!timer_session)
+ return -ENOMEM;
+
+ err = perf_session__process_events(timer_session, &event_ops);
+ return err;
+}
+
+static inline bool timer_hz_unit(struct timer_entry *entry)
+{
+ return entry->type == TIMER || entry->type == VIRTUAL_ITIMER
+ || entry->type == PROF_ITIMER;
+}
+
+static void timer_adjust(struct timer_entry *entry)
+{
+ if (entry->expire_nr) {
+ entry->timer_avg_lat = (double)entry->timer_total_lat /
+ (double)entry->expire_nr;
+ if (timer_hz_unit(entry) && hz) {
+ /* change unit to nanosecond */
+ entry->timer_avg_lat = entry->timer_avg_lat * 1e9 / hz;
+ entry->timer_max_lat = entry->timer_max_lat * 1e9 / hz;
+ }
+ }
+
+ if (entry->run_func_nr)
+ entry->func_avg_runtime = (double)entry->func_total_runtime /
+ (double)entry->run_func_nr;
+}
+
+static struct rb_root timer_result;
+static void sort_result(void)
+{
+ unsigned int i;
+
+ if (!filter_type)
+ filter_type = 0xFF;
+
+ for (i = 0; i < TIMER_HASH_SIZE; i++) {
+ struct timer_entry *entry;
+
+ for (entry = *(timer_hash_table + i); entry;
+ entry = entry->next) {
+ if (!(filter_type & 1 << entry->type))
+ continue;
+
+ bug_nr += entry->bug;
+ timer_adjust(entry);
+ timer_rb_insert(&timer_result, entry, &sort);
+ }
+ }
+}
+
+static inline void
+print_timer_name(struct timer_entry *timer_entry, char *buf, int buf_len)
+{
+ enum timer_type type;
+
+ type = timer_entry->type;
+ if (type != TIMER && type != HRTIMER) {
+ snprintf(buf, buf_len, "%s",
+ ((struct thread *)timer_entry->timer)->comm);
+ return;
+ }
+
+ if (timer_entry->timer_thread)
+ snprintf(buf, buf_len, "%s", timer_entry->timer_thread->comm);
+ else
+ snprintf(buf, buf_len, "%p", timer_entry->timer);
+}
+
+static void __print_timer_lat(struct timer_entry *timer_entry)
+{
+ char buf[20];
+
+ if (!timer_entry->expire_nr)
+ return;
+
+ print_timer_name(timer_entry, buf, sizeof(buf));
+ printf("%-15s", buf);
+
+ printf("|%-15s", timer_type_str[timer_entry->type]);
+ if (timer_hz_unit(timer_entry) && !hz) {
+ printf("|%-10.3f %2s", timer_entry->timer_avg_lat, "HZ");
+ printf("|%-10llu %2s", timer_entry->timer_max_lat, "HZ");
+ } else {
+ printf("|%-13.3f", timer_entry->timer_avg_lat / 1e6);
+ printf("|%-13.3f", timer_entry->timer_max_lat / 1e6);
+ }
+
+ printf("|%-13.3f\n", timer_entry->timer_max_lat_at_ts / 1e9);
+}
+
+static void print_timer_lat(void)
+{
+ printf("\n");
+ printf(" Timer Latency List\n");
+ printf(
+"[ Abbreviations: Avg = average; lat = latency; ts = timestamp ]\n");
+ printf(
+"-------------------------------------------------------------------------\n");
+ printf(
+" Timer | TYPE | Avg-lat ms| Max-lat ms| Max-lat-ts s\n");
+ printf(
+"-------------------------------------------------------------------------\n");
+ timer_rb_traversal(&timer_result, __print_timer_lat);
+}
+
+static void __print_func_runtime(struct timer_entry *timer_entry)
+{
+ u64 function;
+ char buf[20];
+
+ if (!timer_entry->run_func_nr)
+ return;
+
+ print_timer_name(timer_entry, buf, sizeof(buf));
+ printf("%-15s", buf);
+ printf("|%-7s", timer_type_str[timer_entry->type]);
+ printf("|%-9.3f", timer_entry->func_avg_runtime / 1e6);
+ printf("|%-9.3f", timer_entry->func_max_runtime / 1e6);
+ function = timer_entry->max_runtime_at_func;
+ if (function) {
+ struct symbol *sym;
+
+ sym = map_groups__find_function(&timer_session->kmaps,
+ timer_session, function, NULL);
+ if (sym)
+ printf("|%-23.23s", sym->name);
+ else
+ printf("|%-23llx", function);
+ } else
+ printf("|%-23s", "Not - Catch");
+
+ printf("|%-8.3f\n", timer_entry->func_max_runtime_at_ts / 1e9);
+}
+
+static void print_func_runtime(void)
+{
+ printf("\n");
+ printf(" Timer Function Runtime List\n");
+ printf(
+"[ Abbreviations: Avg = average; func = function; rt = runtime; ts = timestamp ]\n");
+
+ printf(
+"-------------------------------------------------------------------------------\n");
+ printf(
+" Timer | Type |Avg-rt ms|Max-rt ms| Max-rt-func | Max-rt-ts s\n");
+ printf(
+"-------------------------------------------------------------------------------\n");
+ timer_rb_traversal(&timer_result, __print_func_runtime);
+}
+
+static void __print_bug_timer(struct timer_entry *timer_entry)
+{
+ char buf[20];
+
+ if (!timer_entry->bug)
+ return;
+
+ print_timer_name(timer_entry, buf, sizeof(buf));
+ printf("%-15s", buf);
+
+ printf("|%-15s", timer_type_str[timer_entry->type]);
+ printf("|%-10d\n", timer_entry->bug);
+}
+
+static void print_bug_timer(void)
+{
+ if (bug_nr) {
+ printf("\nWarning: detect %d bug(s):\n", bug_nr);
+ printf("--------------------------------------------\n");
+ printf(" Timer | Type | Bug \n");
+ printf("--------------------------------------------\n");
+ timer_rb_traversal(&timer_result, __print_bug_timer);
+ }
+}
+
+static void
+__print_stats(enum stat_type stat_type, char *buf, int buf_len)
+{
+ int i;
+
+ for (i = MAX_TIMER_TYPE - 1; i >= 0; i--) {
+ int plen;
+
+ if (!(filter_type & 1 << i))
+ continue;
+ plen = snprintf(buf, buf_len, "%s: %d ", timer_type_str[i],
+ stats[stat_type][i]);
+ buf += plen;
+ buf_len -= plen;
+ }
+}
+
+static void print_stats(void)
+{
+ char buf[128];
+
+ printf("\nStatistics\n==========\n");
+ __print_stats(ACTIVATED, buf, sizeof(buf));
+ printf("Activated timers number:\n %s\n\n", buf);
+ __print_stats(EXPIRED, buf, sizeof(buf));
+ printf("Expired timers number:\n %s\n\n", buf);
+ __print_stats(CANCELED, buf, sizeof(buf));
+ printf("Canceled timers number:\n %s\n\n", buf);
+}
+
+static void print_result(void)
+{
+ if (print_lat)
+ print_timer_lat();
+
+ if (print_runtime)
+ print_func_runtime();
+
+ print_bug_timer();
+ print_stats();
+}
+
+static void __cmd_report(void)
+{
+ setup_pager();
+ read_events();
+ setup_sorting(sort_order);
+ sort_result();
+ print_result();
+ perf_session__delete(timer_session);
+}
+
+static const char *record_args[] = {
+ "record",
+ "-a",
+ "-R",
+ "-M",
+ "-f",
+ "-m", "1024",
+ "-c", "1",
+
+ /* timer */
+ "-e", "timer:timer_start",
+ "-e", "timer:timer_expire_entry",
+ "-e", "timer:timer_expire_exit",
+ "-e", "timer:timer_cancel",
+
+ /* hrtimer */
+ "-e", "timer:hrtimer_start",
+ "-e", "timer:hrtimer_expire_entry",
+ "-e", "timer:hrtimer_expire_exit",
+ "-e", "timer:hrtimer_cancel",
+
+ /* itimer **/
+ "-e", "timer:itimer_state",
+ "-e", "timer:itimer_expire",
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+ unsigned int rec_argc, i, j;
+ const char **rec_argv;
+
+ inject_events = PERF_INJECT_HZ;
+ rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+
+ for (i = 0; i < ARRAY_SIZE(record_args); i++)
+ rec_argv[i] = strdup(record_args[i]);
+
+ for (j = 1; j < (unsigned int)argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ BUG_ON(i != rec_argc);
+
+ return cmd_record(i, rec_argv, NULL);
+}
+
+int cmd_timer(int argc, const char **argv, const char *prefix __used)
+{
+ argc = parse_options(argc, argv, timer_options, timer_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(timer_usage, timer_options);
+
+ symbol__init();
+ if (!strncmp(argv[0], "rec", 3))
+ return __cmd_record(argc, argv);
+
+ if (!strcmp(argv[0], "report")) {
+ if (argc > 1) {
+ argc = parse_options(argc, argv, report_options,
+ report_usage, 0);
+ if (argc)
+ usage_with_options(report_usage,
+ report_options);
+ }
+ __cmd_report();
+
+ } else
+ usage_with_options(timer_usage, timer_options);
+
+ return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 18035b1..e294128 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -30,5 +30,6 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_version(int argc, const char **argv, const char *prefix);
extern int cmd_probe(int argc, const char **argv, const char *prefix);
extern int cmd_kmem(int argc, const char **argv, const char *prefix);
+extern int cmd_timer(int argc, const char **argv, const char *prefix);

#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 71dc7c3..e522f5d 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -16,3 +16,4 @@ perf-top mainporcelain common
perf-trace mainporcelain common
perf-probe mainporcelain common
perf-kmem mainporcelain common
+perf-timer mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index fc89005..dfb2732 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -301,6 +301,7 @@ static void handle_internal_command(int argc, const char **argv)
{ "sched", cmd_sched, 0 },
{ "probe", cmd_probe, 0 },
{ "kmem", cmd_kmem, 0 },
+ { "timer", cmd_timer, 0 },
};
unsigned int i;
static const char ext[] = STRIP_EXTENSION;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 80fb365..6432c75 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -75,6 +75,12 @@ struct sample_data {
void *raw_data;
};

+struct inject_event {
+ struct perf_event_header header;
+ u32 inject_event_id;
+ u64 value;
+};
+
#define BUILD_ID_SIZE 20

struct build_id_event {
@@ -92,6 +98,7 @@ typedef union event_union {
struct lost_event lost;
struct read_event read;
struct sample_event sample;
+ struct inject_event inject;
} event_t;

struct events_stats {
--
1.6.1.2

2009-12-29 10:49:16

by Stijn Devriendt

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ

> 'inject' event is a very useful feature and it's suggested by Ingo
> [ See http://lkml.org/lkml/2009/12/28/31 ]

Is it possible to, instead of injecting a specific inject_event enum,
pass a type/config pair (+task/cpu?) to inject the value of that
specific event? Then the HZ could have it's own perf_event type.

This integrates better with what I'm struggling with right now:
counters that are not ever incrementing but rather indicate
a discrete value inside the kernel that might change both
up and down. My dynticks knowledge is rather limited but
if HZ might vary then this approach is definately worth it.

Regards,
Stijn

2009-12-29 11:36:19

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ



Stijn Devriendt wrote:
>> 'inject' event is a very useful feature and it's suggested by Ingo
>> [ See http://lkml.org/lkml/2009/12/28/31 ]
>
> Is it possible to, instead of injecting a specific inject_event enum,
> pass a type/config pair (+task/cpu?) to inject the value of that
> specific event? Then the HZ could have it's own perf_event type.
>

Until now, 'inject' event just trigger parameter/query events, if
you need sample cpu/task, you can use other event.

> This integrates better with what I'm struggling with right now:
> counters that are not ever incrementing but rather indicate
> a discrete value inside the kernel that might change both
> up and down. My dynticks knowledge is rather limited but
> if HZ might vary then this approach is definately worth it.
>

dynticks not touch HZ at all as i know.

Thanks,
Xiao

2009-12-30 09:20:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ

On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> 'inject' event is a very useful feature and it's suggested by Ingo
> [ See http://lkml.org/lkml/2009/12/28/31 ]

I really dislike the name, event injection to me would be like a write()
interface where you provide the actual event data to be stuffed in the
output stream.

This just seems like a very weird way of getting data out. A kind of
like sysconf() but done very strange.

2009-12-30 09:28:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ


* Peter Zijlstra <[email protected]> wrote:

> On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > 'inject' event is a very useful feature and it's suggested by Ingo
> > [ See http://lkml.org/lkml/2009/12/28/31 ]
>
> I really dislike the name, event injection to me would be like a write()
> interface where you provide the actual event data to be stuffed in the
> output stream.
>
> This just seems like a very weird way of getting data out. A kind of like
> sysconf() but done very strange.

What kind of API would you suggest?

Ingo

2009-12-30 09:37:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ

On Wed, 2009-12-30 at 10:28 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > > 'inject' event is a very useful feature and it's suggested by Ingo
> > > [ See http://lkml.org/lkml/2009/12/28/31 ]
> >
> > I really dislike the name, event injection to me would be like a write()
> > interface where you provide the actual event data to be stuffed in the
> > output stream.
> >
> > This just seems like a very weird way of getting data out. A kind of like
> > sysconf() but done very strange.
>
> What kind of API would you suggest?

sysconf() seems ideal for getting single, mostly constant variables out
of the kernel, we already have non POSIX (read Linux specific) names in
there.

_SC_KERNEL_RELOCATION_OFFSET
_SC_KERNEL_HZ

or somethings like that comes to mind.

2009-12-30 09:38:45

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ



Peter Zijlstra wrote:
> On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
>> 'inject' event is a very useful feature and it's suggested by Ingo
>> [ See http://lkml.org/lkml/2009/12/28/31 ]
>
> I really dislike the name, event injection to me would be like a write()
> interface where you provide the actual event data to be stuffed in the
> output stream.
>

Yes, it like write and it writes something form kernel to perf.
And, i think this name is suitable for it's doing that injects an
'artificial' event, it's well if you have other name for it :-)

> This just seems like a very weird way of getting data out. A kind of
> like sysconf() but done very strange.
>

It's since some parameter is only used by perf, suce as HZ in this patch and
'relocation offset' in my other patchset, it also well if has better way to
do it :-)

Thanks,
Xiao

2009-12-30 09:44:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ


* Peter Zijlstra <[email protected]> wrote:

> On Wed, 2009-12-30 at 10:28 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > > > 'inject' event is a very useful feature and it's suggested by Ingo
> > > > [ See http://lkml.org/lkml/2009/12/28/31 ]
> > >
> > > I really dislike the name, event injection to me would be like a write()
> > > interface where you provide the actual event data to be stuffed in the
> > > output stream.
> > >
> > > This just seems like a very weird way of getting data out. A kind of like
> > > sysconf() but done very strange.
> >
> > What kind of API would you suggest?
>
> sysconf() seems ideal for getting single, mostly constant variables out
> of the kernel, we already have non POSIX (read Linux specific) names in
> there.
>
> _SC_KERNEL_RELOCATION_OFFSET
> _SC_KERNEL_HZ
>
> or somethings like that comes to mind.

Nah. HZ might be static here but we'll have more dynamic parameters so the
sysconf API is rather inflexible there.

I was thinking of something that fits into existing perf API schemes, to keep
it simple and self-sufficient, and to allow for extensions/etc.

Ingo

2009-12-30 09:46:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ

On Wed, 2009-12-30 at 17:37 +0800, Xiao Guangrong wrote:
> > I really dislike the name, event injection to me would be like a write()
> > interface where you provide the actual event data to be stuffed in the
> > output stream.
> >
>
> Yes, it like write and it writes something form kernel to perf.
> And, i think this name is suitable for it's doing that injects an
> 'artificial' event, it's well if you have other name for it :-)

No, what I means is that you cannot actually inject an arbitrary event.
Sure it injects something, but that's pretty limited.

> > This just seems like a very weird way of getting data out. A kind of
> > like sysconf() but done very strange.
> >
>
> It's since some parameter is only used by perf, suce as HZ in this patch and
> 'relocation offset' in my other patchset, it also well if has better way to
> do it :-)

The thing is, this proposed interface is very much tied to perf and
makes it basically impossible for !perf to use. Whereas I can imagine
esp. the relocation offset to be interesting for other things. Basically
everything needing to resolve a kernel symbol.


2009-12-30 10:06:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ

On Wed, 2009-12-30 at 10:36 +0100, Peter Zijlstra wrote:
> On Wed, 2009-12-30 at 10:28 +0100, Ingo Molnar wrote:
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Tue, 2009-12-29 at 13:21 +0800, Xiao Guangrong wrote:
> > > > 'inject' event is a very useful feature and it's suggested by Ingo
> > > > [ See http://lkml.org/lkml/2009/12/28/31 ]
> > >
> > > I really dislike the name, event injection to me would be like a write()
> > > interface where you provide the actual event data to be stuffed in the
> > > output stream.
> > >
> > > This just seems like a very weird way of getting data out. A kind of like
> > > sysconf() but done very strange.
> >
> > What kind of API would you suggest?
>
> sysconf() seems ideal for getting single, mostly constant variables out
> of the kernel, we already have non POSIX (read Linux specific) names in
> there.
>
> _SC_KERNEL_RELOCATION_OFFSET
> _SC_KERNEL_HZ
>
> or somethings like that comes to mind.

OK so there is no sysconf() syscall and its all implemented in glibc,
which is utter suckage.. will have to come up with something saner then.

2009-12-30 11:30:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v3 1/5] perf_event: introduce 'inject' event and get HZ


* Peter Zijlstra <[email protected]> wrote:

> OK so there is no sysconf() syscall and its all implemented in glibc, which
> is utter suckage.. will have to come up with something saner then.

I think integrating it into the perf stream of events would be the most
desirable approach - tooling could make use of it easily.

For data record (and hierarchy) description /debug/tracing/events/ would be
useful.

We dont want to actually 'activate' them in an 'event' fashion - but we could
use event IDs and the record format and the transport to describe and recover
the values.

For transport we could use the mmap ring-buffer and it would also be nice to
add (or resurrect) some 'quick & easy' read()/write() based transport. It a
bit like a value read-out from a counter - just here it's not a regular
counter but some sort of kernel-internal value that we want to provide.

Hm?

Ingo