2011-02-21 21:02:37

by David Ahern

[permalink] [raw]
Subject: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2

v1 -> v2:
- removed change to WARN_ON_ONCE in perf_event.c
- fixed init of realtime_clock event (needed call to perf_swevent_init_hrtimer)
- changed userspace event name to clock-realtime with the alias clkr as the
raw events code sucks up all options that start with 'r' followed by a hex
character
- remove RAW sample hack for pushing realtime clock event data to userspace
- plumbed PERF_SAMPLE_READ to enable retrieving event data during sample
generation
- add patch to perf-stat to recognize realtime-clock as a nsec counter
- implementation of tiemhist in perf-record was revamped significantly
-- to use event data,
-- to better handle fallback to synthesized event,
-- added feature to force a sample to be generated at startup (to output
wall-clock times we need a sample immediately; after that realtime-clock
samples are generated periodically)
-- change counter creation to be done only on first cpu; we do not need a
counter for each cpu

A request was made to modify perf-trace and perf-script. From what I can see
perf-trace does not exist yet and perf-script requires raw data which means
it does not work for the primary use case that motivated this feature
(perf record -e cs -1 -ag).


CC: Arnaldo Carvalho de Melo <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Thomas Gleixner <[email protected]>

David Ahern (4):
perf events: Introduce realtime clock event
perf events: plumbing for PERF_SAMPLE_READ and read_format
perf events: add timehist option to record and report
perf stat: treat realtime-clock as nsec counter

include/linux/perf_event.h | 2 +
kernel/perf_event.c | 85 ++++++++++++
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/Documentation/perf-report.txt | 6 +
tools/perf/builtin-record.c | 113 ++++++++++++++++-
tools/perf/builtin-report.c | 208 +++++++++++++++++++++++++++++-
tools/perf/builtin-stat.c | 5 +-
tools/perf/builtin-test.c | 3 +-
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 18 +++-
tools/perf/util/evlist.c | 2 +-
tools/perf/util/evlist.h | 2 +
tools/perf/util/evsel.c | 24 +++-
tools/perf/util/header.c | 18 +++
tools/perf/util/header.h | 1 +
tools/perf/util/parse-events.c | 2 +
tools/perf/util/python.c | 3 +-
tools/perf/util/session.c | 16 +++
tools/perf/util/session.h | 9 +-
19 files changed, 507 insertions(+), 15 deletions(-)

--
1.7.3.4


2011-02-21 21:02:43

by David Ahern

[permalink] [raw]
Subject: [PATCH 3/4] perf events: add timehist option to record and report

The intent of the timehist option is to 'pretty-print' the samples
recorded rather than generating a histogram. This is done by
sampling the realtime clock event and correlating perf_clock time
stamps to wall-clock.

If the realtime-clock event is not available (e.g, older kernels)
fallback to a synthesized event. (I realize there is resistance
to new synthesized events, but it is a simple way to gain this
feature on older kernels without the need to modify the kernel
code).

Signed-off-by: David Ahern <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/perf_event.c | 19 +++
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/Documentation/perf-report.txt | 6 +
tools/perf/builtin-record.c | 108 +++++++++++++++-
tools/perf/builtin-report.c | 208 +++++++++++++++++++++++++++++-
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 8 +
tools/perf/util/evlist.c | 2 +-
tools/perf/util/evlist.h | 2 +
tools/perf/util/session.c | 4 +
tools/perf/util/session.h | 3 +-
12 files changed, 360 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 51a2f34..404b1ee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -240,6 +240,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
+#define PERF_EVENT_IOC_RECORD_SAMPLE _IO('$', 7)

enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 85f60c0..0e1053e 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3231,6 +3231,7 @@ static struct perf_event *perf_fget_light(int fd, int *fput_needed)
static int perf_event_set_output(struct perf_event *event,
struct perf_event *output_event);
static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+static int perf_event_generate_sample(struct perf_event *event);

static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
@@ -3277,6 +3278,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PERF_EVENT_IOC_SET_FILTER:
return perf_event_set_filter(event, (void __user *)arg);

+ case PERF_EVENT_IOC_RECORD_SAMPLE:
+ return perf_event_generate_sample(event);
+
default:
return -ENOTTY;
}
@@ -4379,6 +4383,21 @@ exit:
rcu_read_unlock();
}

+/* add a sample to the event stream based on user request */
+static int perf_event_generate_sample(struct perf_event *event)
+{
+ struct perf_sample_data data;
+ struct pt_regs regs;
+
+ perf_fetch_caller_regs(&regs);
+ event->pmu->read(event);
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+ perf_event_output(event, 0, &data, &regs);
+
+ return 0;
+}
+
/*
* read event_id
*/
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 5a520f8..437d041 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -148,6 +148,10 @@ an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must ha
corresponding events, i.e., they always refer to events defined earlier on the command
line.

+--timehist::
+Collect data for time history report. This option adds reference time samples
+to the event stream for converting perf timestamps to time-of-day.
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 8ba03d6..27b98e0 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -119,6 +119,12 @@ OPTIONS
--symfs=<directory>::
Look for files with symbols relative to this directory.

+--timehist::
+ Generate time history output. This shows each sample with a wall-clock
+ timestamp and address to symbol conversions. The --timehist option
+ must be used with the record to get wall-clock timestamps.
+
+
SEE ALSO
--------
linkperf:perf-stat[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e39883e..dd06949 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -56,6 +56,8 @@ static bool nodelay = false;
static bool raw_samples = false;
static bool sample_id_all_avail = true;
static bool system_wide = false;
+static bool time_history = false;
+static bool synth_reftime = false;
static pid_t target_pid = -1;
static pid_t target_tid = -1;
static pid_t child_pid = -1;
@@ -235,7 +237,7 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)

attr->sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID;

- if (evlist->nr_entries > 1)
+ if ((evlist->nr_entries > 1) || time_history)
attr->sample_type |= PERF_SAMPLE_ID;

/*
@@ -280,6 +282,12 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
attr->sample_type |= PERF_SAMPLE_CPU;
}

+ if (time_history) {
+ attr->sample_type |= PERF_SAMPLE_TIME;
+ attr->sample_type |= PERF_SAMPLE_CPU;
+ attr->sample_type |= PERF_SAMPLE_READ;
+ }
+
if (nodelay) {
attr->watermark = 0;
attr->wakeup_events = 1;
@@ -294,6 +302,86 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
}
}

+static int perf_event__synthesize_reftime(perf_event__handler_t process,
+ struct perf_session *psession)
+{
+ union perf_event ev;
+ struct timespec tp;
+
+ memset(&ev, 0, sizeof(ev));
+
+ /* race here between successive calls, but should be close enough */
+ if (gettimeofday(&ev.reftime.tv, NULL) != 0) {
+ error("gettimeofday failed. Cannot generate reference time.\n");
+ return -1;
+ }
+ if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
+ error("clock_gettime failed. Cannot generate reference time.\n");
+ return -1;
+ }
+ ev.reftime.nsec = (u64) tp.tv_sec * NSEC_PER_SEC + (u64) tp.tv_nsec;
+
+ ev.header.type = PERF_RECORD_REFTIME;
+ ev.header.size = sizeof(ev.reftime);
+
+ return process(&ev, NULL, psession);
+}
+
+static void create_timehist_counter(void)
+{
+ int fd;
+ struct perf_event_attr attr;
+ struct perf_evsel *evsel;
+ /* only on 1 cpu */
+ struct cpu_map *cpus = cpu_map__new("0");
+ /* not associated with a process */
+ struct thread_map *threads = thread_map__new(-1, -1);
+
+ struct perf_evsel *first_evsel = list_entry(evsel_list->entries.next,
+ struct perf_evsel, node);
+
+ /* start with the attributes used for other events */
+ attr = first_evsel->attr;
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_REALTIME_CLOCK;
+ attr.sample_period = 3600 * NSEC_PER_SEC;
+
+ evsel = perf_evsel__new(&attr, evsel_list->nr_entries);
+ if (!evsel)
+ die("Error: Failed to allocate memory for time counter\n");
+
+ config_attr(evsel, evsel_list);
+
+ if (perf_evsel__open(evsel, cpus, threads, 0, 1) < 0) {
+ if (errno == EINVAL) {
+ synth_reftime = true;
+ return;
+ }
+ die("Failed to open realtime clock event\n");
+ }
+
+ if (perf_evsel__alloc_id(evsel, cpus->nr, threads->nr) < 0)
+ die("Failed to allocate an id for timehist event\n");
+
+ fd = FD(evsel, 0, 0);
+ if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, FD(first_evsel, 0, 0)) != 0)
+ die("ioctl failed for timehist event\n");
+
+ if (perf_evlist__id_hash(evsel_list, evsel, 0, 0, fd) < 0)
+ die("id_hash failed for timehist event\n");
+
+ create_counter(evsel, 0);
+
+ /* generate first sample - want a sample immediately so
+ * that time conversions are avialable from the get-go.
+ * Let user-specified rate take care of samples after that.
+ */
+ if (ioctl(fd, PERF_EVENT_IOC_RECORD_SAMPLE) != 0)
+ error("failed to generate sample for realtime clock\n");
+
+ return;
+}
+
static void open_counters(struct perf_evlist *evlist)
{
struct perf_evsel *pos;
@@ -335,7 +423,8 @@ try_again:
* Old kernel, no attr->sample_id_type_all field
*/
sample_id_all_avail = false;
- if (!sample_time && !raw_samples && !time_needed)
+ if (!sample_time && !raw_samples
+ && !time_needed && !time_history)
attr->sample_type &= ~PERF_SAMPLE_TIME;

goto retry_sample_id;
@@ -378,6 +467,9 @@ try_again:
list_for_each_entry(pos, &evlist->entries, node)
create_counter(pos, cpu);
}
+
+ if (time_history)
+ create_timehist_counter();
}

static int process_buildids(void)
@@ -657,6 +749,16 @@ static int __cmd_record(int argc, const char **argv)
}
}

+ if (synth_reftime) {
+ if (verbose)
+ warning(" ... fall back to synthesized reftime\n");
+
+ if (perf_event__synthesize_reftime(process_synthesized_event,
+ session) != 0)
+ error("Failed to create reftime event. "
+ "Cannot generate wall-clock timestamps\n");
+ }
+
machine = perf_session__find_host_machine(session);
if (!machine) {
pr_err("Couldn't find native kernel information.\n");
@@ -815,6 +917,8 @@ const struct option record_options[] = {
OPT_CALLBACK('G', "cgroup", &evsel_list, "name",
"monitor event in cgroup name only",
parse_cgroups),
+ OPT_BOOLEAN(0, "timehist", &time_history,
+ "collect data for time history report"),
OPT_END()
};

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dddcc7e..83b5f24 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -36,6 +36,7 @@ static char const *input_name = "perf.data";
static bool force, use_tui, use_stdio;
static bool hide_unresolved;
static bool dont_use_callchains;
+static bool time_history;

static bool show_threads;
static struct perf_read_values show_threads_values;
@@ -46,6 +47,191 @@ static const char *pretty_printing_style = default_pretty_printing_style;
static char callchain_default_opt[] = "fractal,0.5";
static symbol_filter_t annotate_init;

+
+struct timeval tv_ref;
+u64 timestamp_ref;
+
+static const char *timestr(u64 timestamp)
+{
+ struct tm ltime;
+ u64 dt;
+ struct timeval tv_dt, tv_res;
+ static char tstr[64];
+
+ if (timestamp_ref == 0) {
+ static bool show_msg = true;
+ if (show_msg) {
+ if (verbose)
+ warning("Reference clock event not seen yet; "
+ "Cannot generate wall-clock time strings until then.\n");
+ show_msg = false;
+ }
+ snprintf(tstr, sizeof(tstr), "%16" PRId64, timestamp);
+ return tstr;
+ } else if (timestamp == 0) {
+ snprintf(tstr, sizeof(tstr), "%16" PRId64, timestamp);
+ return tstr;
+ }
+ if (timestamp > timestamp_ref) {
+ dt = timestamp - timestamp_ref;
+ tv_dt.tv_sec = (time_t) (dt / NSEC_PER_SEC);
+ tv_dt.tv_usec = (dt - tv_dt.tv_sec * NSEC_PER_SEC) / 1000;
+ timeradd(&tv_ref, &tv_dt, &tv_res);
+ } else {
+ dt = timestamp_ref - timestamp;
+ tv_dt.tv_sec = (time_t) (dt / NSEC_PER_SEC);
+ tv_dt.tv_usec = (dt - tv_dt.tv_sec * NSEC_PER_SEC) / 1000;
+ timersub(&tv_ref, &tv_dt, &tv_res);
+ }
+
+ if (localtime_r(&tv_res.tv_sec, &ltime) == NULL) {
+ snprintf(tstr, sizeof(tstr), "00:00:00.000000 %16" PRId64, timestamp);
+ } else {
+ char date[64];
+ /* TO-DO: make timestr format configurable */
+ strftime(date, sizeof(date), "%H:%M:%S", &ltime);
+ snprintf(tstr, sizeof(tstr), "%s.%06ld %16" PRId64,
+ date, tv_res.tv_usec, timestamp);
+ }
+
+ return tstr;
+}
+
+#define TIMEHIST_FMT "%32s %2d %s %s %d %16" PRIx64 " %s (%s)\n"
+
+static void timehist_header(void)
+{
+ printf("%32s %s %7s %3s %5s %16s %s (%s)\n\n",
+ " Time-of-Day Kernel Timestamp",
+ "cpu", "Event", "Command", "PID",
+ " IP ", "Symbol Name", "DSO Name");
+}
+
+static int perf_event__process_reftime(union perf_event *event,
+ struct perf_session *session __used)
+{
+ tv_ref = event->reftime.tv;
+ timestamp_ref = event->reftime.nsec;
+
+ return 0;
+}
+
+static void perf_event__process_rclk(struct perf_sample *sample,
+ struct perf_session *session)
+{
+ u64 s;
+
+ if (!(session->sample_type & PERF_SAMPLE_READ)) {
+ static bool show_msg = true;
+ if (show_msg) {
+ printf("rclk sample does not have event data.\n"
+ "Was record done with --timehist option?\n");
+ show_msg = false;
+ }
+ return;
+ }
+
+ if (!(session->sample_type & PERF_SAMPLE_TIME)) {
+ printf("rclk sample does not have kernel sample time\n");
+ return;
+ }
+
+ /* convert raw sample to wall-clock reference time */
+ s = sample->values.value;
+ tv_ref.tv_sec = s / NSEC_PER_SEC;
+ tv_ref.tv_usec = (s - tv_ref.tv_sec * NSEC_PER_SEC) / 1000;
+
+ /* match raw sample with kernel timestamp */
+ timestamp_ref = sample->time;
+}
+
+static void perf_session__print_sample(struct perf_session *session,
+ struct addr_location *al,
+ struct perf_sample *sample)
+{
+ static int show_timehist_error = 1;
+ struct callchain_cursor_node *node, *prev;
+ u64 timestamp = 0;
+ const char *tstr;
+ bool need_spacer = false;
+ struct perf_event_attr *attr;
+ const char *evname = NULL;
+ const char *symname = "", *dsoname = "";
+
+ if (show_timehist_error && ((sample->cpu == (u32) -1) ||
+ !(session->sample_type & PERF_SAMPLE_TIME))) {
+ pr_err("Data for time history missing in perf event samples.\n"
+ "Did you record with --timehist option?\n");
+ show_timehist_error = 0;
+ }
+
+ attr = perf_header__find_attr(sample->id, &session->header);
+ if (attr)
+ evname = __event_name(attr->type, attr->config);
+
+ if (session->sample_type & PERF_SAMPLE_TIME)
+ timestamp = sample->time;
+
+ tstr = timestr(timestamp);
+
+ if ((symbol_conf.use_callchain) && sample->callchain) {
+
+ if (perf_session__resolve_callchain(session, al->thread,
+ sample->callchain, NULL) != 0) {
+ fprintf(stderr, "failed to resolve callchain. skipping\n");
+ return;
+ }
+
+ node = session->callchain_cursor.first;
+ while (node) {
+
+ if (node->sym && node->sym->name)
+ symname = node->sym->name;
+ else if (hide_unresolved)
+ continue;
+
+ if (node->map && node->map->dso && node->map->dso->name)
+ dsoname = node->map->dso->name;
+ else if (hide_unresolved)
+ continue;
+
+ printf(TIMEHIST_FMT,
+ tstr, sample->cpu, evname ? evname : "-",
+ al->thread->comm_set ? al->thread->comm : "-",
+ al->thread->pid,
+ node->ip,
+ symname, dsoname);
+
+ need_spacer = true;
+
+ prev = node;
+ node = node->next;
+ }
+
+ } else {
+ if (al->sym && al->sym->name)
+ symname = al->sym->name;
+
+ if (al->map && al->map->dso && al->map->dso->name)
+ dsoname = al->map->dso->name;
+
+ printf(TIMEHIST_FMT,
+ tstr, sample->cpu, evname ? evname : "-",
+ al->thread->comm_set ? al->thread->comm : "-",
+ al->thread->pid, al->addr,
+ symname, dsoname);
+
+ need_spacer = true;
+ }
+
+ /* put a gap between records */
+ if (need_spacer)
+ printf("\n");
+
+ return;
+}
+
+
static struct hists *perf_session__hists_findnew(struct perf_session *self,
u64 event_stream, u32 type,
u64 config)
@@ -167,6 +353,13 @@ static int process_sample_event(union perf_event *event,
struct addr_location al;
struct perf_event_attr *attr;

+ attr = perf_header__find_attr(sample->id, &session->header);
+ if (time_history && attr &&
+ (attr->type == PERF_TYPE_SOFTWARE) &&
+ (attr->config == PERF_COUNT_SW_REALTIME_CLOCK)) {
+ perf_event__process_rclk(sample, session);
+ }
+
if (perf_event__preprocess_sample(event, session, &al, sample,
annotate_init) < 0) {
fprintf(stderr, "problem processing %d event, skipping it.\n",
@@ -177,7 +370,9 @@ static int process_sample_event(union perf_event *event,
if (al.filtered || (hide_unresolved && al.sym == NULL))
return 0;

- if (perf_session__add_hist_entry(session, &al, sample)) {
+ if (time_history) {
+ perf_session__print_sample(session, &al, sample);
+ } else if (perf_session__add_hist_entry(session, &al, sample)) {
pr_debug("problem incrementing symbol period, skipping event\n");
return -1;
}
@@ -257,6 +452,7 @@ static struct perf_event_ops event_ops = {
.event_type = perf_event__process_event_type,
.tracing_data = perf_event__process_tracing_data,
.build_id = perf_event__process_build_id,
+ .reftime = perf_event__process_reftime,
.ordered_samples = true,
.ordering_requires_timestamps = true,
};
@@ -334,6 +530,9 @@ static int __cmd_report(void)
if (ret)
goto out_delete;

+ if (time_history)
+ timehist_header();
+
ret = perf_session__process_events(session, &event_ops);
if (ret)
goto out_delete;
@@ -349,6 +548,9 @@ static int __cmd_report(void)
if (verbose > 2)
perf_session__fprintf_dsos(session, stdout);

+ if (time_history)
+ goto out_delete;
+
next = rb_first(&session->hists_tree);

if (next == NULL) {
@@ -504,6 +706,8 @@ static const struct option options[] = {
"Only display entries resolved to a symbol"),
OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
"Look for files with symbols relative to this directory"),
+ OPT_BOOLEAN(0, "timehist", &time_history,
+ "Dump time history of event samples"),
OPT_END()
};

@@ -511,7 +715,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __used)
{
argc = parse_options(argc, argv, options, report_usage, 0);

- if (use_stdio)
+ if (use_stdio || time_history)
use_browser = 0;
else if (use_tui)
use_browser = 1;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index fbf5754..6bbd551 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -24,6 +24,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
[PERF_RECORD_HEADER_BUILD_ID] = "BUILD_ID",
[PERF_RECORD_FINISHED_ROUND] = "FINISHED_ROUND",
+ [PERF_RECORD_REFTIME] = "REF_TIME",
};

const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 512a1ca..d4810e0 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -99,6 +99,7 @@ enum perf_user_event_type { /* above any possible kernel type */
PERF_RECORD_HEADER_TRACING_DATA = 66,
PERF_RECORD_HEADER_BUILD_ID = 67,
PERF_RECORD_FINISHED_ROUND = 68,
+ PERF_RECORD_REFTIME = 69,
PERF_RECORD_HEADER_MAX
};

@@ -125,6 +126,12 @@ struct tracing_data_event {
u32 size;
};

+struct reftime_event {
+ struct perf_event_header header;
+ struct timeval tv;
+ u64 nsec;
+};
+
union perf_event {
struct perf_event_header header;
struct ip_event ip;
@@ -138,6 +145,7 @@ union perf_event {
struct event_type_event event_type;
struct tracing_data_event tracing_data;
struct build_id_event build_id;
+ struct reftime_event reftime;
};

void perf_event__print_totals(void);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 95b21fe..bb49243 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -106,7 +106,7 @@ void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
evlist->nr_fds++;
}

-static int perf_evlist__id_hash(struct perf_evlist *evlist, struct perf_evsel *evsel,
+int perf_evlist__id_hash(struct perf_evlist *evlist, struct perf_evsel *evsel,
int cpu, int thread, int fd)
{
struct perf_sample_id *sid;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index c988405..bd73572 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -48,6 +48,8 @@ union perf_event *perf_evlist__read_on_cpu(struct perf_evlist *self, int cpu);
int perf_evlist__alloc_mmap(struct perf_evlist *evlist);
int perf_evlist__mmap(struct perf_evlist *evlist, int pages, bool overwrite);
void perf_evlist__munmap(struct perf_evlist *evlist);
+int perf_evlist__id_hash(struct perf_evlist *evlist, struct perf_evsel *evsel,
+ int cpu, int thread, int fd);

static inline void perf_evlist__set_maps(struct perf_evlist *evlist,
struct cpu_map *cpus,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dc0235b..1ef8e8a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -791,6 +791,10 @@ static int perf_session__process_user_event(struct perf_session *session, union
return ops->build_id(event, session);
case PERF_RECORD_FINISHED_ROUND:
return ops->finished_round(event, session, ops);
+ case PERF_RECORD_REFTIME:
+ if (ops->reftime)
+ return ops->reftime(event, session);
+ return -EINVAL;
default:
return -EINVAL;
}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 212f810..b46672a 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -78,7 +78,8 @@ struct perf_event_ops {
event_synth_op attr,
event_type,
tracing_data,
- build_id;
+ build_id,
+ reftime;
event_op2 finished_round;
bool ordered_samples;
bool ordering_requires_timestamps;
--
1.7.3.4

2011-02-21 21:02:39

by David Ahern

[permalink] [raw]
Subject: [PATCH 2/4] perf events: plumbing for PERF_SAMPLE_READ and read_format

Signed-off-by: David Ahern <[email protected]>
---
tools/perf/builtin-record.c | 5 +++++
tools/perf/builtin-test.c | 3 ++-
tools/perf/util/event.h | 10 +++++++++-
tools/perf/util/evsel.c | 24 +++++++++++++++++++++---
tools/perf/util/header.c | 18 ++++++++++++++++++
tools/perf/util/header.h | 1 +
tools/perf/util/python.c | 3 ++-
tools/perf/util/session.c | 12 ++++++++++++
tools/perf/util/session.h | 6 +++++-
9 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index db4cd1e..e39883e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -41,6 +41,7 @@ enum write_mode_t {
static u64 user_interval = ULLONG_MAX;
static u64 default_interval = 0;
static u64 sample_type;
+static u64 read_format;

static unsigned int page_size;
static unsigned int mmap_pages = 128;
@@ -218,6 +219,9 @@ static void create_counter(struct perf_evsel *evsel, int cpu)

if (!sample_type)
sample_type = attr->sample_type;
+
+ if (!read_format)
+ read_format = attr->read_format;
}

static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
@@ -595,6 +599,7 @@ static int __cmd_record(int argc, const char **argv)
open_counters(evsel_list);

perf_session__set_sample_type(session, sample_type);
+ perf_session__set_read_format(session, read_format);

/*
* perf_session__delete(session) will be called at atexit_header()
diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index 1b2106c..a243d51 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -558,7 +558,8 @@ static int test__basic_mmap(void)
goto out_munmap;
}

- perf_event__parse_sample(event, attr.sample_type, false, &sample);
+ perf_event__parse_sample(event, attr.sample_type,
+ attr.read_format, false, &sample);
evsel = perf_evlist__id2evsel(evlist, sample.id);
if (evsel == NULL) {
pr_debug("event with id %" PRIu64
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 9c35170..512a1ca 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -61,6 +61,13 @@ struct sample_event {
u64 array[];
};

+struct read_format {
+ u64 value;
+ u64 time_enabled;
+ u64 time_running;
+ u64 id;
+};
+
struct perf_sample {
u64 ip;
u32 pid, tid;
@@ -69,6 +76,7 @@ struct perf_sample {
u64 id;
u64 stream_id;
u64 period;
+ struct read_format values;
u32 cpu;
u32 raw_size;
void *raw_data;
@@ -178,6 +186,6 @@ int perf_event__preprocess_sample(const union perf_event *self,
const char *perf_event__name(unsigned int id);

int perf_event__parse_sample(const union perf_event *event, u64 type,
- bool sample_id_all, struct perf_sample *sample);
+ u64 read_format, bool sample_id_all, struct perf_sample *sample);

#endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 63cadaf..d93ce70 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -293,7 +293,7 @@ static int perf_event__parse_id_sample(const union perf_event *event, u64 type,
}

int perf_event__parse_sample(const union perf_event *event, u64 type,
- bool sample_id_all, struct perf_sample *data)
+ u64 read_format, bool sample_id_all, struct perf_sample *data)
{
const u64 *array;

@@ -353,8 +353,26 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
}

if (type & PERF_SAMPLE_READ) {
- fprintf(stderr, "PERF_SAMPLE_READ is unsuported for now\n");
- return -1;
+ if (read_format & PERF_FORMAT_GROUP) {
+ printf("READ_FORMAT_GROUP is unsuported for now\n");
+ return -1;
+ } else {
+ data->values.value = *array;
+ array++;
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
+ data->values.time_enabled = *array;
+ array++;
+ }
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
+ data->values.time_running = *array;
+ array++;
+ }
+ if (read_format & PERF_FORMAT_ID) {
+ data->values.id = *array;
+ array++;
+ }
+ }
}

if (type & PERF_SAMPLE_CALLCHAIN) {
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 72c124d..f48b0b1 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -951,6 +951,23 @@ u64 perf_header__sample_type(struct perf_header *header)
return type;
}

+u64 perf_header__read_format(struct perf_header *header)
+{
+ u64 read_format = 0;
+ int i;
+
+ for (i = 0; i < header->attrs; i++) {
+ struct perf_header_attr *attr = header->attr[i];
+
+ if (!read_format)
+ read_format = attr->attr.read_format;
+ else if (read_format != attr->attr.read_format)
+ die("non matching read_format");
+ }
+
+ return read_format;
+}
+
bool perf_header__sample_id_all(const struct perf_header *header)
{
bool value = false, first = true;
@@ -1079,6 +1096,7 @@ int perf_event__process_attr(union perf_event *event,
}

perf_session__update_sample_type(session);
+ perf_session__update_read_format(session);

return 0;
}
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index f042ceb..5d89e7f 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -84,6 +84,7 @@ void perf_header_attr__delete(struct perf_header_attr *self);
int perf_header_attr__add_id(struct perf_header_attr *self, u64 id);

u64 perf_header__sample_type(struct perf_header *header);
+u64 perf_header__read_format(struct perf_header *header);
bool perf_header__sample_id_all(const struct perf_header *header);
struct perf_event_attr *
perf_header__find_attr(u64 id, struct perf_header *header);
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 5317ef2..b67c25f 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -684,7 +684,8 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
return PyErr_NoMemory();

first = list_entry(evlist->entries.next, struct perf_evsel, node);
- perf_event__parse_sample(event, first->attr.sample_type, sample_id_all,
+ perf_event__parse_sample(event, first->attr.sample_type,
+ first->attr.read_format, sample_id_all,
&pevent->sample);
return pyevent;
}
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index a3a871f..dc0235b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -110,6 +110,17 @@ void perf_session__update_sample_type(struct perf_session *self)
perf_session__id_header_size(self);
}

+void perf_session__set_read_format(struct perf_session *session,
+ u64 read_format)
+{
+ session->read_format = read_format;
+}
+
+void perf_session__update_read_format(struct perf_session *self)
+{
+ self->read_format = perf_header__read_format(&self->header);
+}
+
int perf_session__create_kernel_maps(struct perf_session *self)
{
int ret = machine__create_kernel_maps(&self->host_machine);
@@ -172,6 +183,7 @@ struct perf_session *perf_session__new(const char *filename, int mode,
}

perf_session__update_sample_type(self);
+ perf_session__update_read_format(self);

if (ops && ops->ordering_requires_timestamps &&
ops->ordered_samples && !self->sample_id_all) {
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 977b3a1..212f810 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -43,6 +43,7 @@ struct perf_session {
*/
struct hists hists;
u64 sample_type;
+ u64 read_format;
int fd;
bool fd_pipe;
bool repipe;
@@ -115,6 +116,8 @@ void perf_session__update_sample_type(struct perf_session *self);
void perf_session__set_sample_id_all(struct perf_session *session, bool value);
void perf_session__set_sample_type(struct perf_session *session, u64 type);
void perf_session__remove_thread(struct perf_session *self, struct thread *th);
+void perf_session__set_read_format(struct perf_session *session, u64 read_format);
+void perf_session__update_read_format(struct perf_session *self);

static inline
struct machine *perf_session__find_host_machine(struct perf_session *self)
@@ -162,7 +165,8 @@ static inline int perf_session__parse_sample(struct perf_session *session,
struct perf_sample *sample)
{
return perf_event__parse_sample(event, session->sample_type,
- session->sample_id_all, sample);
+ session->read_format, session->sample_id_all,
+ sample);
}

#endif /* __PERF_SESSION_H */
--
1.7.3.4

2011-02-21 21:02:45

by David Ahern

[permalink] [raw]
Subject: [PATCH 4/4] perf stat: treat realtime-clock as nsec counter

Based on patch from Arnaldo Carvalho de Melo <[email protected]>

Signed-off-by: David Ahern <[email protected]>
---
tools/perf/builtin-stat.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 21c0252..35d45b4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -184,8 +184,9 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
*/
static inline int nsec_counter(struct perf_evsel *evsel)
{
- if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
- perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
+ if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
+ perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK) ||
+ perf_evsel__match(evsel, SOFTWARE, SW_REALTIME_CLOCK))
return 1;

return 0;
--
1.7.3.4

2011-02-21 21:03:22

by David Ahern

[permalink] [raw]
Subject: [PATCH 1/4] perf events: Introduce realtime clock event

The motivation for this event is to convert perf_clock() time stamps
to wall-clock (gettimeofday()) equivalents for comparing perf events
to other log files.

This patch is based on the monotonic patch by Arnaldo Carvalho de Melo
<[email protected]>.

Signed-off-by: David Ahern <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/perf_event.c | 66 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/parse-events.c | 2 +
3 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8ceb5a6..51a2f34 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -105,6 +105,7 @@ enum perf_sw_ids {
PERF_COUNT_SW_PAGE_FAULTS_MAJ = 6,
PERF_COUNT_SW_ALIGNMENT_FAULTS = 7,
PERF_COUNT_SW_EMULATION_FAULTS = 8,
+ PERF_COUNT_SW_REALTIME_CLOCK = 9,

PERF_COUNT_SW_MAX, /* non-ABI */
};
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a0a6987..85f60c0 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -5413,6 +5413,7 @@ static int perf_swevent_init(struct perf_event *event)
switch (event_id) {
case PERF_COUNT_SW_CPU_CLOCK:
case PERF_COUNT_SW_TASK_CLOCK:
+ case PERF_COUNT_SW_REALTIME_CLOCK:
return -ENOENT;

default:
@@ -5751,6 +5752,70 @@ static struct pmu perf_cpu_clock = {
};

/*
+ * Software event: realtime wall time clock
+ */
+
+static void realtime_clock_event_update(struct perf_event *event)
+{
+ u64 now = ktime_to_ns(ktime_get_real());
+ local64_set(&event->count, now);
+}
+
+static void realtime_clock_event_start(struct perf_event *event, int flags)
+{
+ realtime_clock_event_update(event);
+ perf_swevent_start_hrtimer(event);
+}
+
+static void realtime_clock_event_stop(struct perf_event *event, int flags)
+{
+ perf_swevent_cancel_hrtimer(event);
+ realtime_clock_event_update(event);
+}
+
+static int realtime_clock_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ realtime_clock_event_start(event, flags);
+
+ return 0;
+}
+
+static void realtime_clock_event_del(struct perf_event *event, int flags)
+{
+ realtime_clock_event_stop(event, flags);
+}
+
+static void realtime_clock_event_read(struct perf_event *event)
+{
+ realtime_clock_event_update(event);
+}
+
+static int realtime_clock_event_init(struct perf_event *event)
+{
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ if (event->attr.config != PERF_COUNT_SW_REALTIME_CLOCK)
+ return -ENOENT;
+
+ perf_swevent_init_hrtimer(event);
+
+ return 0;
+}
+
+static struct pmu perf_realtime_clock = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = realtime_clock_event_init,
+ .add = realtime_clock_event_add,
+ .del = realtime_clock_event_del,
+ .start = realtime_clock_event_start,
+ .stop = realtime_clock_event_stop,
+ .read = realtime_clock_event_read,
+};
+
+/*
* Software event: task time clock
*/

@@ -7285,6 +7350,7 @@ void __init perf_event_init(void)
init_srcu_struct(&pmus_srcu);
perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
perf_pmu_register(&perf_cpu_clock, NULL, -1);
+ perf_pmu_register(&perf_realtime_clock, NULL, -1);
perf_pmu_register(&perf_task_clock, NULL, -1);
perf_tp_register();
perf_cpu_notifier(perf_cpu_notify);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 54a7e26..c40bfef 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -41,6 +41,7 @@ static struct event_symbol event_symbols[] = {

{ CSW(CPU_CLOCK), "cpu-clock", "" },
{ CSW(TASK_CLOCK), "task-clock", "" },
+ { CSW(REALTIME_CLOCK), "clock-realtime", "clkr" },
{ CSW(PAGE_FAULTS), "page-faults", "faults" },
{ CSW(PAGE_FAULTS_MIN), "minor-faults", "" },
{ CSW(PAGE_FAULTS_MAJ), "major-faults", "" },
@@ -78,6 +79,7 @@ static const char *sw_event_names[] = {
"major-faults",
"alignment-faults",
"emulation-faults",
+ "realtime-clock-msecs",
};

#define MAX_ALIASES 8
--
1.7.3.4

2011-02-21 21:37:47

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2

On Mon, Feb 21, 2011 at 02:02:26PM -0700, David Ahern wrote:
> v1 -> v2:
> - removed change to WARN_ON_ONCE in perf_event.c
> - fixed init of realtime_clock event (needed call to perf_swevent_init_hrtimer)
> - changed userspace event name to clock-realtime with the alias clkr as the
> raw events code sucks up all options that start with 'r' followed by a hex
> character
> - remove RAW sample hack for pushing realtime clock event data to userspace
> - plumbed PERF_SAMPLE_READ to enable retrieving event data during sample
> generation
> - add patch to perf-stat to recognize realtime-clock as a nsec counter
> - implementation of tiemhist in perf-record was revamped significantly
> -- to use event data,
> -- to better handle fallback to synthesized event,
> -- added feature to force a sample to be generated at startup (to output
> wall-clock times we need a sample immediately; after that realtime-clock
> samples are generated periodically)
> -- change counter creation to be done only on first cpu; we do not need a
> counter for each cpu
>
> A request was made to modify perf-trace and perf-script. From what I can see
> perf-trace does not exist yet and perf-script requires raw data which means
> it does not work for the primary use case that motivated this feature
> (perf record -e cs -1 -ag).

The goal is actually to extend perf script to handle more than just raw data.
So that it can handle the rest of what we can find in an event: time, ip, stacktraces...

You've added 200 lines in perf report to add the dump support. It wouldn't
require more to extend perf script to do that. And the result is going to be
much more powerful.

Look at struct scripting_ops::process_event().

What you need is too pass in the whole event instead of only the raw data,
cpu and so on...

You don't even need to add the support in the scripts themselves, they can
just cope with the new parameter layout. You only need to handle the rest of
the event in the print_event() function (the default scripting_ops::process_event())
to print ip, time and stacktraces.

This probably needs some option in perf script to avoid showing ip and other
new things. Although we probably want to print stacktraces by default
if there are present.

2011-02-21 21:41:57

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2



On 02/21/11 14:37, Frederic Weisbecker wrote:

> The goal is actually to extend perf script to handle more than just raw data.
> So that it can handle the rest of what we can find in an event: time, ip, stacktraces...
>
> You've added 200 lines in perf report to add the dump support. It wouldn't
> require more to extend perf script to do that. And the result is going to be
> much more powerful.
>
> Look at struct scripting_ops::process_event().

I actually have a draft of perf-script - essentially duplicating sample
processing done in perf-report. When it got to the point of having to
add a lot of code -- other features essentially -- just to get it to the
point of being ready for this feature I stopped.

David


>
> What you need is too pass in the whole event instead of only the raw data,
> cpu and so on...
>
> You don't even need to add the support in the scripts themselves, they can
> just cope with the new parameter layout. You only need to handle the rest of
> the event in the print_event() function (the default scripting_ops::process_event())
> to print ip, time and stacktraces.
>
> This probably needs some option in perf script to avoid showing ip and other
> new things. Although we probably want to print stacktraces by default
> if there are present.

2011-02-21 21:55:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2

On Mon, Feb 21, 2011 at 02:41:53PM -0700, David Ahern wrote:
>
>
> On 02/21/11 14:37, Frederic Weisbecker wrote:
>
> > The goal is actually to extend perf script to handle more than just raw data.
> > So that it can handle the rest of what we can find in an event: time, ip, stacktraces...
> >
> > You've added 200 lines in perf report to add the dump support. It wouldn't
> > require more to extend perf script to do that. And the result is going to be
> > much more powerful.
> >
> > Look at struct scripting_ops::process_event().
>
> I actually have a draft of perf-script - essentially duplicating sample
> processing done in perf-report. When it got to the point of having to
> add a lot of code -- other features essentially -- just to get it to the
> point of being ready for this feature I stopped.


I don't understand why it's harder to extend print_event() rather than
perf report.

2011-02-21 22:09:56

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2



On 02/21/11 14:55, Frederic Weisbecker wrote:
> On Mon, Feb 21, 2011 at 02:41:53PM -0700, David Ahern wrote:
>>
>>
>> On 02/21/11 14:37, Frederic Weisbecker wrote:
>>
>>> The goal is actually to extend perf script to handle more than just raw data.
>>> So that it can handle the rest of what we can find in an event: time, ip, stacktraces...
>>>
>>> You've added 200 lines in perf report to add the dump support. It wouldn't
>>> require more to extend perf script to do that. And the result is going to be
>>> much more powerful.
>>>
>>> Look at struct scripting_ops::process_event().
>>
>> I actually have a draft of perf-script - essentially duplicating sample
>> processing done in perf-report. When it got to the point of having to
>> add a lot of code -- other features essentially -- just to get it to the
>> point of being ready for this feature I stopped.
>
>
> I don't understand why it's harder to extend print_event() rather than
> perf report.

All of the changes to perf-report are related strictly to this feature -
generating the timestamp and printing the sample including walking the
callchain.

perf-script needs to have features added to it:
1. working with all samples,
2. support for callchains,
3. more?
Then it is ready for adding a timestring to the output.

Sure it can be done, but a series of patches unrelated to timehist are
needed first.

As part of looking at perf-script (the draft I mentioned) I actually
moved most of the perf-report changes into a util/timehist.c as it is
directly usable by both commands.

David

2011-02-21 22:29:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2

On Mon, Feb 21, 2011 at 03:09:52PM -0700, David Ahern wrote:
>
>
> On 02/21/11 14:55, Frederic Weisbecker wrote:
> > On Mon, Feb 21, 2011 at 02:41:53PM -0700, David Ahern wrote:
> >>
> >>
> >> On 02/21/11 14:37, Frederic Weisbecker wrote:
> >>
> >>> The goal is actually to extend perf script to handle more than just raw data.
> >>> So that it can handle the rest of what we can find in an event: time, ip, stacktraces...
> >>>
> >>> You've added 200 lines in perf report to add the dump support. It wouldn't
> >>> require more to extend perf script to do that. And the result is going to be
> >>> much more powerful.
> >>>
> >>> Look at struct scripting_ops::process_event().
> >>
> >> I actually have a draft of perf-script - essentially duplicating sample
> >> processing done in perf-report. When it got to the point of having to
> >> add a lot of code -- other features essentially -- just to get it to the
> >> point of being ready for this feature I stopped.
> >
> >
> > I don't understand why it's harder to extend print_event() rather than
> > perf report.
>
> All of the changes to perf-report are related strictly to this feature -
> generating the timestamp and printing the sample including walking the
> callchain.

This is the actual drawback: it's only useful for your feature. My wish
is to have something more broadly useful. And support for callchains or
other things like this in perf script is desired and has been requested
by the past.

> perf-script needs to have features added to it:
> 1. working with all samples,

Why do you need that? You seem to be only interested in tracepoint
events.

Sure I would appreciate that perf script can support any event as a bonus
but that doesn't seem mandatory here.

> 2. support for callchains,

What does it take more than what you did in perf report, namely
calling perf_session_resolve_callchain and walking the cursor?

> 3. more?

?

2011-02-22 01:16:52

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2



On 02/21/11 15:21, Frederic Weisbecker wrote:
>> All of the changes to perf-report are related strictly to this feature -
>> generating the timestamp and printing the sample including walking the
>> callchain.
>
> This is the actual drawback: it's only useful for your feature. My wish
> is to have something more broadly useful. And support for callchains or
> other things like this in perf script is desired and has been requested
> by the past.

I don't agree with the conclusion. The ability to take a realtime-clock
sample + a perf_clock time stamps and prepend it to a dump of perf
samples is not limited to perf-report. To date, it's the only command
setup to use it - or maybe a better phrasing is that it is the command I
most use today.

perf-script is new. I created this timehist patch back in August and
submitted it to linux-perf-users at the end of November. The version I
sent today is the 3rd incarnation.

If it is a question of code location then I can move the function
additions from builtin-report.c to util/timehist.c. Then as other
commands are ready they can pick it up.

>
>> perf-script needs to have features added to it:
>> 1. working with all samples,
>
> Why do you need that? You seem to be only interested in tracepoint
> events.

I am *not* interested in tracepoints at all -- at least so far. I have
rarely used them, mainly just trying out the options to see what is
available and how much data comes pouring in. (tracepoints are not
available on the kernel version we use.)

Rather, I have been focused on the S/W events such as the context-switch
event and H/W counters. In fact, one request I have received on my end
is for a time history of H/W counters - record periodically during
specific tests and dump the samples with timestamps.

>
> Sure I would appreciate that perf script can support any event as a bonus
> but that doesn't seem mandatory here.
>
>> 2. support for callchains,
>
> What does it take more than what you did in perf report, namely
> calling perf_session_resolve_callchain and walking the cursor?

It's not rocket science, nonetheless its code that needs to be brought
into perf-script.


>
>> 3. more?
>
> ?

There always seems to be unexpected gotchas that have to be dealt with.


David

2011-02-22 02:50:11

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2

On Mon, Feb 21, 2011 at 06:16:41PM -0700, David Ahern wrote:
>
>
> On 02/21/11 15:21, Frederic Weisbecker wrote:
> >> All of the changes to perf-report are related strictly to this feature -
> >> generating the timestamp and printing the sample including walking the
> >> callchain.
> >
> > This is the actual drawback: it's only useful for your feature. My wish
> > is to have something more broadly useful. And support for callchains or
> > other things like this in perf script is desired and has been requested
> > by the past.
>
> I don't agree with the conclusion. The ability to take a realtime-clock
> sample + a perf_clock time stamps and prepend it to a dump of perf
> samples is not limited to perf-report. To date, it's the only command
> setup to use it - or maybe a better phrasing is that it is the command I
> most use today.

Sure it's useful. But it doesn't integrate well.

With your proposition we have two seperate tools to look at events
sequentially: perf report to look at wall clock, normal clock, ip and sym.
And perf script to look at normal clock, pid, cpu and trace.

This split is simply a mistake. How tracing users are going to join
both outputs?

This is even more sad if you consider that the events that most need to
be displayed sequentially are the tracepoints. Sure we want to treat every
events the same, but if tracepoints traces can't be added up with
these new features, your thing is just half useful.

I don't see any good reason for that.


> perf-script is new. I created this timehist patch back in August and
> submitted it to linux-perf-users at the end of November. The version I
> sent today is the 3rd incarnation.

perf-script is not new, it was there in november already, although named
perf trace at that time.

>
> If it is a question of code location then I can move the function
> additions from builtin-report.c to util/timehist.c. Then as other
> commands are ready they can pick it up.

It's not a question of file, it's a question on where it integrates
logically and if it does coherently.

> >
> >> perf-script needs to have features added to it:
> >> 1. working with all samples,
> >
> > Why do you need that? You seem to be only interested in tracepoint
> > events.
>
> I am *not* interested in tracepoints at all -- at least so far. I have
> rarely used them, mainly just trying out the options to see what is
> available and how much data comes pouring in. (tracepoints are not
> available on the kernel version we use.)

Fine, that doesn't change much anyway. If you follow my idea on passing
the whole event to the process_event() callback, there is no big things
to change compared to integrate that to perf report.

For event that don't have raw trace, just don't print raw trace, that's
the only difference.

I'm not even asking you to handle that for scripts, the python and perf
part can ignore that. It just needs to be handled for the print_event()
case.

It's barely a bit more in your diff compared to the perf-report
case.


> Rather, I have been focused on the S/W events such as the context-switch
> event and H/W counters. In fact, one request I have received on my end
> is for a time history of H/W counters - record periodically during
> specific tests and dump the samples with timestamps.

Fine.

> >
> > Sure I would appreciate that perf script can support any event as a bonus
> > but that doesn't seem mandatory here.
> >
> >> 2. support for callchains,
> >
> > What does it take more than what you did in perf report, namely
> > calling perf_session_resolve_callchain and walking the cursor?
>
> It's not rocket science, nonetheless its code that needs to be brought
> into perf-script.

No difference compared to perf report integration then.

> >> 3. more?
> >
> > ?
>
> There always seems to be unexpected gotchas that have to be dealt with.

Sure, I suspect we are not talking about hundred of lines of code though.

2011-02-22 07:42:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2


* Frederic Weisbecker <[email protected]> wrote:

> perf-script is not new, it was there in november already, although named perf
> trace at that time.

Yes.

The new 'perf trace' utility - which might be the straightforward 'report a trace'
utility that David is looking for is still WIP and can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git tmp.perf/trace

Warning: the code is stale and needs much more work to become mergable. But
displaying GTOD timestamps is very much something that looks sensible in the 'perf
trace' context as well.

Thanks,

Ingo

2011-02-22 08:07:20

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf events: Add realtime clock event and timehist option -v2

On 02/22/11 00:41, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
>> perf-script is not new, it was there in november already, although named perf
>> trace at that time.
>
> Yes.
>
> The new 'perf trace' utility - which might be the straightforward 'report a trace'
> utility that David is looking for is still WIP and can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git tmp.perf/trace
>
> Warning: the code is stale and needs much more work to become mergable. But
> displaying GTOD timestamps is very much something that looks sensible in the 'perf
> trace' context as well.

Ok. I'll look at moving the timehist option from perf-report to perf-script.

David

>
> Thanks,
>
> Ingo