2011-02-28 03:52:25

by David Ahern

[permalink] [raw]
Subject: [PATCH 0/6 v3] perf events: Add realtime clock event, time-of-day strings to script output

v2 -> v3:
- changed timehist arg in perf-record to tod to better reflect usage
- removed timehist changes from perf-report
- added dumping software events to perf-script
- added time-of-day output to perf-script

v1 -> v2:
- removed change to WARN_ON_ONCE in perf_event.c
- fixed init of realtime_clock event (needed call to perf_swevent_init_hrtimer)
- changed userspace event name to clock-realtime with the alias clkr as the
raw events code sucks up all options that start with 'r' followed by a hex
character
- remove RAW sample hack for pushing realtime clock event data to userspace
- plumbed PERF_SAMPLE_READ to enable retrieving event data during sample
generation
- add patch to perf-stat to recognize realtime-clock as a nsec counter
- implementation of tiemhist in perf-record was revamped significantly
-- to use event data,
-- to better handle fallback to synthesized event,
-- added feature to force a sample to be generated at startup (to output
wall-clock times we need a sample immediately; after that realtime-clock
samples are generated periodically)
-- change counter creation to be done only on first cpu; we do not need a
counter for each cpu

A request was made to modify perf-trace and perf-script. From what I can see
perf-trace does not exist yet and perf-script requires raw data which means
it does not work for the primary use case that motivated this feature
(perf record -e cs -1 -ag).


CC: Arnaldo Carvalho de Melo <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Frederic Weisbecker <[email protected]>
CC: Paul Mackerras <[email protected]>
CC: Thomas Gleixner <[email protected]>

David Ahern (6):
perf events: Introduce realtime clock event
perf events: plumbing for PERF_SAMPLE_READ and read_format
perf record: add time-of-day option
perf script: dump software events too
perf script: Prepend lines with time-of-day string
perf stat: treat realtime-clock as nsec counter

include/linux/perf_event.h | 2 +
kernel/perf_event.c | 85 ++++++++++++++++++++++
tools/perf/Documentation/perf-record.txt | 5 ++
tools/perf/builtin-record.c | 115 +++++++++++++++++++++++++++++-
tools/perf/builtin-script.c | 89 ++++++++++++++++++++++--
tools/perf/builtin-stat.c | 5 +-
tools/perf/builtin-test.c | 3 +-
tools/perf/util/event.c | 96 +++++++++++++++++++++++++
tools/perf/util/event.h | 28 +++++++-
tools/perf/util/evlist.c | 2 +-
tools/perf/util/evlist.h | 2 +
tools/perf/util/evsel.c | 24 ++++++-
tools/perf/util/header.c | 18 +++++
tools/perf/util/header.h | 1 +
tools/perf/util/parse-events.c | 2 +
tools/perf/util/python.c | 3 +-
tools/perf/util/session.c | 114 +++++++++++++++++++++++++++++
tools/perf/util/session.h | 12 +++-
18 files changed, 588 insertions(+), 18 deletions(-)

--
1.7.4


2011-02-28 03:52:27

by David Ahern

[permalink] [raw]
Subject: [PATCH 1/6] perf events: Introduce realtime clock event

The motivation for this event is to correlate perf_clock() time stamps
to wall-clock (gettimeofday()) equivalents for comparing perf events
to other log files.

This patch is based on the monotonic patch by Arnaldo Carvalho de Melo
<[email protected]>.

Signed-off-by: David Ahern <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/perf_event.c | 66 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/parse-events.c | 2 +
3 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8ceb5a6..51a2f34 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -105,6 +105,7 @@ enum perf_sw_ids {
PERF_COUNT_SW_PAGE_FAULTS_MAJ = 6,
PERF_COUNT_SW_ALIGNMENT_FAULTS = 7,
PERF_COUNT_SW_EMULATION_FAULTS = 8,
+ PERF_COUNT_SW_REALTIME_CLOCK = 9,

PERF_COUNT_SW_MAX, /* non-ABI */
};
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 64a018e..a25a63d 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -5432,6 +5432,7 @@ static int perf_swevent_init(struct perf_event *event)
switch (event_id) {
case PERF_COUNT_SW_CPU_CLOCK:
case PERF_COUNT_SW_TASK_CLOCK:
+ case PERF_COUNT_SW_REALTIME_CLOCK:
return -ENOENT;

default:
@@ -5770,6 +5771,70 @@ static struct pmu perf_cpu_clock = {
};

/*
+ * Software event: realtime wall time clock
+ */
+
+static void realtime_clock_event_update(struct perf_event *event)
+{
+ u64 now = ktime_to_ns(ktime_get_real());
+ local64_set(&event->count, now);
+}
+
+static void realtime_clock_event_start(struct perf_event *event, int flags)
+{
+ realtime_clock_event_update(event);
+ perf_swevent_start_hrtimer(event);
+}
+
+static void realtime_clock_event_stop(struct perf_event *event, int flags)
+{
+ perf_swevent_cancel_hrtimer(event);
+ realtime_clock_event_update(event);
+}
+
+static int realtime_clock_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ realtime_clock_event_start(event, flags);
+
+ return 0;
+}
+
+static void realtime_clock_event_del(struct perf_event *event, int flags)
+{
+ realtime_clock_event_stop(event, flags);
+}
+
+static void realtime_clock_event_read(struct perf_event *event)
+{
+ realtime_clock_event_update(event);
+}
+
+static int realtime_clock_event_init(struct perf_event *event)
+{
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ if (event->attr.config != PERF_COUNT_SW_REALTIME_CLOCK)
+ return -ENOENT;
+
+ perf_swevent_init_hrtimer(event);
+
+ return 0;
+}
+
+static struct pmu perf_realtime_clock = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = realtime_clock_event_init,
+ .add = realtime_clock_event_add,
+ .del = realtime_clock_event_del,
+ .start = realtime_clock_event_start,
+ .stop = realtime_clock_event_stop,
+ .read = realtime_clock_event_read,
+};
+
+/*
* Software event: task time clock
*/

@@ -7296,6 +7361,7 @@ void __init perf_event_init(void)
init_srcu_struct(&pmus_srcu);
perf_pmu_register(&perf_swevent, "software", PERF_TYPE_SOFTWARE);
perf_pmu_register(&perf_cpu_clock, NULL, -1);
+ perf_pmu_register(&perf_realtime_clock, NULL, -1);
perf_pmu_register(&perf_task_clock, NULL, -1);
perf_tp_register();
perf_cpu_notifier(perf_cpu_notify);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 54a7e26..c40bfef 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -41,6 +41,7 @@ static struct event_symbol event_symbols[] = {

{ CSW(CPU_CLOCK), "cpu-clock", "" },
{ CSW(TASK_CLOCK), "task-clock", "" },
+ { CSW(REALTIME_CLOCK), "clock-realtime", "clkr" },
{ CSW(PAGE_FAULTS), "page-faults", "faults" },
{ CSW(PAGE_FAULTS_MIN), "minor-faults", "" },
{ CSW(PAGE_FAULTS_MAJ), "major-faults", "" },
@@ -78,6 +79,7 @@ static const char *sw_event_names[] = {
"major-faults",
"alignment-faults",
"emulation-faults",
+ "realtime-clock-msecs",
};

#define MAX_ALIASES 8
--
1.7.4

2011-02-28 03:52:31

by David Ahern

[permalink] [raw]
Subject: [PATCH 3/6] perf record: add time-of-day option

Enable data collection for generating time-of-day strings when
printing individual perf samples. This is done by sampling the
realtime clock event with the perf_clock time stamps.

If the realtime-clock event is not available (e.g, older kernels)
fallback to a synthesized event. (I realize there is resistance
to new synthesized events, but it is a simple way to gain this
feature on older kernels without the need to modify the kernel
code).

Signed-off-by: David Ahern <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/perf_event.c | 19 +++++
tools/perf/Documentation/perf-record.txt | 5 ++
tools/perf/builtin-record.c | 110 +++++++++++++++++++++++++++++-
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 8 ++
tools/perf/util/evlist.c | 2 +-
tools/perf/util/evlist.h | 2 +
tools/perf/util/session.c | 4 +
tools/perf/util/session.h | 3 +-
10 files changed, 151 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 51a2f34..404b1ee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -240,6 +240,7 @@ struct perf_event_attr {
#define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
#define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
#define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
+#define PERF_EVENT_IOC_RECORD_SAMPLE _IO('$', 7)

enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index a25a63d..7999f55 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -3250,6 +3250,7 @@ static struct perf_event *perf_fget_light(int fd, int *fput_needed)
static int perf_event_set_output(struct perf_event *event,
struct perf_event *output_event);
static int perf_event_set_filter(struct perf_event *event, void __user *arg);
+static int perf_event_generate_sample(struct perf_event *event);

static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
@@ -3296,6 +3297,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PERF_EVENT_IOC_SET_FILTER:
return perf_event_set_filter(event, (void __user *)arg);

+ case PERF_EVENT_IOC_RECORD_SAMPLE:
+ return perf_event_generate_sample(event);
+
default:
return -ENOTTY;
}
@@ -4398,6 +4402,21 @@ exit:
rcu_read_unlock();
}

+/* add a sample to the event stream based on user request */
+static int perf_event_generate_sample(struct perf_event *event)
+{
+ struct perf_sample_data data;
+ struct pt_regs regs;
+
+ perf_fetch_caller_regs(&regs);
+ event->pmu->read(event);
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+ perf_event_output(event, 0, &data, &regs);
+
+ return 0;
+}
+
/*
* read event_id
*/
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 5a520f8..8eb5b0a 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -148,6 +148,11 @@ an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must ha
corresponding events, i.e., they always refer to events defined earlier on the command
line.

+--tod::
+Collect data for time-of-day strings when printing events. This option adds
+reference time samples to the event stream for converting perf timestamps to
+time-of-day.
+
SEE ALSO
--------
linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e39883e..ac0717c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -56,6 +56,8 @@ static bool nodelay = false;
static bool raw_samples = false;
static bool sample_id_all_avail = true;
static bool system_wide = false;
+static bool want_tod_data = false;
+static bool synth_reftime = false;
static pid_t target_pid = -1;
static pid_t target_tid = -1;
static pid_t child_pid = -1;
@@ -235,7 +237,7 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)

attr->sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID;

- if (evlist->nr_entries > 1)
+ if ((evlist->nr_entries > 1) || want_tod_data)
attr->sample_type |= PERF_SAMPLE_ID;

/*
@@ -280,6 +282,12 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
attr->sample_type |= PERF_SAMPLE_CPU;
}

+ if (want_tod_data) {
+ attr->sample_type |= PERF_SAMPLE_TIME;
+ attr->sample_type |= PERF_SAMPLE_CPU;
+ attr->sample_type |= PERF_SAMPLE_READ;
+ }
+
if (nodelay) {
attr->watermark = 0;
attr->wakeup_events = 1;
@@ -294,6 +302,88 @@ static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
}
}

+static int perf_event__synthesize_reftime(perf_event__handler_t process,
+ struct perf_session *psession)
+{
+ union perf_event ev;
+ struct timespec tp;
+
+ memset(&ev, 0, sizeof(ev));
+
+ /* race here between successive calls, but should be close enough */
+ if (gettimeofday(&ev.reftime.tv, NULL) != 0) {
+ error("gettimeofday failed. Cannot generate reference time.\n");
+ return -1;
+ }
+ if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
+ error("clock_gettime failed. Cannot generate reference time.\n");
+ return -1;
+ }
+ ev.reftime.nsec = (u64) tp.tv_sec * NSEC_PER_SEC + (u64) tp.tv_nsec;
+
+ ev.header.type = PERF_RECORD_REFTIME;
+ ev.header.size = sizeof(ev.reftime);
+
+ return process(&ev, NULL, psession);
+}
+
+static void create_tod_counter(void)
+{
+ int fd;
+ struct perf_event_attr attr;
+ struct perf_evsel *evsel;
+ /* only on 1 cpu */
+ struct cpu_map *cpus = cpu_map__new("0");
+ /* not associated with a process */
+ struct thread_map *threads = thread_map__new(-1, -1);
+
+ struct perf_evsel *first_evsel = list_entry(evsel_list->entries.next,
+ struct perf_evsel, node);
+
+ attr = first_evsel->attr;
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_REALTIME_CLOCK;
+ attr.sample_period = 3600 * NSEC_PER_SEC;
+ attr.freq = 0;
+
+ evsel = perf_evsel__new(&attr, evsel_list->nr_entries);
+ if (!evsel)
+ die("Error: Failed to allocate memory for time counter\n");
+
+ config_attr(evsel, evsel_list);
+
+ if (perf_evsel__open(evsel, cpus, threads, 0, 1) < 0) {
+ if (errno == EINVAL) {
+ if (verbose)
+ warning("Failed to open realtime clock event\n");
+ synth_reftime = true;
+ return;
+ }
+ die("Failed to open realtime clock event\n");
+ }
+
+ if (perf_evsel__alloc_id(evsel, cpus->nr, threads->nr) < 0)
+ die("Failed to allocate an id for realtime-clock event\n");
+
+ fd = FD(evsel, 0, 0);
+ if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, FD(first_evsel, 0, 0)) != 0)
+ die("Failed to add realtime-clock event to output stream\n");
+
+ if (perf_evlist__id_hash(evsel_list, evsel, 0, 0, fd) < 0)
+ die("id_hash failed for realtime-clock event\n");
+
+ create_counter(evsel, 0);
+
+ /* generate first sample - want a sample immediately so
+ * that time conversions are avialable from the get-go.
+ * Let user-specified rate take care of samples after that.
+ */
+ if (ioctl(fd, PERF_EVENT_IOC_RECORD_SAMPLE) != 0)
+ error("failed to generate sample for realtime clock\n");
+
+ return;
+}
+
static void open_counters(struct perf_evlist *evlist)
{
struct perf_evsel *pos;
@@ -335,7 +425,8 @@ try_again:
* Old kernel, no attr->sample_id_type_all field
*/
sample_id_all_avail = false;
- if (!sample_time && !raw_samples && !time_needed)
+ if (!sample_time && !raw_samples
+ && !time_needed && !want_tod_data)
attr->sample_type &= ~PERF_SAMPLE_TIME;

goto retry_sample_id;
@@ -378,6 +469,9 @@ try_again:
list_for_each_entry(pos, &evlist->entries, node)
create_counter(pos, cpu);
}
+
+ if (want_tod_data)
+ create_tod_counter();
}

static int process_buildids(void)
@@ -657,6 +751,16 @@ static int __cmd_record(int argc, const char **argv)
}
}

+ if (synth_reftime) {
+ if (verbose)
+ warning(" ... fall back to synthesized reftime\n");
+
+ if (perf_event__synthesize_reftime(process_synthesized_event,
+ session) != 0)
+ error("Failed to create reftime event. "
+ "Cannot generate wall-clock timestamps\n");
+ }
+
machine = perf_session__find_host_machine(session);
if (!machine) {
pr_err("Couldn't find native kernel information.\n");
@@ -815,6 +919,8 @@ const struct option record_options[] = {
OPT_CALLBACK('G', "cgroup", &evsel_list, "name",
"monitor event in cgroup name only",
parse_cgroups),
+ OPT_BOOLEAN(0, "tod", &want_tod_data,
+ "collect data for time-of-day strings"),
OPT_END()
};

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index fbf5754..6bbd551 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -24,6 +24,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
[PERF_RECORD_HEADER_BUILD_ID] = "BUILD_ID",
[PERF_RECORD_FINISHED_ROUND] = "FINISHED_ROUND",
+ [PERF_RECORD_REFTIME] = "REF_TIME",
};

const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 512a1ca..d4810e0 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -99,6 +99,7 @@ enum perf_user_event_type { /* above any possible kernel type */
PERF_RECORD_HEADER_TRACING_DATA = 66,
PERF_RECORD_HEADER_BUILD_ID = 67,
PERF_RECORD_FINISHED_ROUND = 68,
+ PERF_RECORD_REFTIME = 69,
PERF_RECORD_HEADER_MAX
};

@@ -125,6 +126,12 @@ struct tracing_data_event {
u32 size;
};

+struct reftime_event {
+ struct perf_event_header header;
+ struct timeval tv;
+ u64 nsec;
+};
+
union perf_event {
struct perf_event_header header;
struct ip_event ip;
@@ -138,6 +145,7 @@ union perf_event {
struct event_type_event event_type;
struct tracing_data_event tracing_data;
struct build_id_event build_id;
+ struct reftime_event reftime;
};

void perf_event__print_totals(void);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 95b21fe..bb49243 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -106,7 +106,7 @@ void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
evlist->nr_fds++;
}

-static int perf_evlist__id_hash(struct perf_evlist *evlist, struct perf_evsel *evsel,
+int perf_evlist__id_hash(struct perf_evlist *evlist, struct perf_evsel *evsel,
int cpu, int thread, int fd)
{
struct perf_sample_id *sid;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index c988405..bd73572 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -48,6 +48,8 @@ union perf_event *perf_evlist__read_on_cpu(struct perf_evlist *self, int cpu);
int perf_evlist__alloc_mmap(struct perf_evlist *evlist);
int perf_evlist__mmap(struct perf_evlist *evlist, int pages, bool overwrite);
void perf_evlist__munmap(struct perf_evlist *evlist);
+int perf_evlist__id_hash(struct perf_evlist *evlist, struct perf_evsel *evsel,
+ int cpu, int thread, int fd);

static inline void perf_evlist__set_maps(struct perf_evlist *evlist,
struct cpu_map *cpus,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dc0235b..1ef8e8a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -791,6 +791,10 @@ static int perf_session__process_user_event(struct perf_session *session, union
return ops->build_id(event, session);
case PERF_RECORD_FINISHED_ROUND:
return ops->finished_round(event, session, ops);
+ case PERF_RECORD_REFTIME:
+ if (ops->reftime)
+ return ops->reftime(event, session);
+ return -EINVAL;
default:
return -EINVAL;
}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 212f810..b46672a 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -78,7 +78,8 @@ struct perf_event_ops {
event_synth_op attr,
event_type,
tracing_data,
- build_id;
+ build_id,
+ reftime;
event_op2 finished_round;
bool ordered_samples;
bool ordering_requires_timestamps;
--
1.7.4

2011-02-28 03:52:39

by David Ahern

[permalink] [raw]
Subject: [PATCH 6/6] perf stat: treat realtime-clock as nsec counter

Based on patch from Arnaldo Carvalho de Melo <[email protected]>

Signed-off-by: David Ahern <[email protected]>
---
tools/perf/builtin-stat.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 21c0252..35d45b4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -184,8 +184,9 @@ static int create_perf_stat_counter(struct perf_evsel *evsel)
*/
static inline int nsec_counter(struct perf_evsel *evsel)
{
- if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
- perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
+ if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
+ perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK) ||
+ perf_evsel__match(evsel, SOFTWARE, SW_REALTIME_CLOCK))
return 1;

return 0;
--
1.7.4

2011-02-28 03:52:54

by David Ahern

[permalink] [raw]
Subject: [PATCH 5/6] perf script: Prepend lines with time-of-day string

Handle realtime-clock events and setting reference times that are then used
to correlate perf_clock times with time-of-day. Output lines are then
prepended with a time-of-day string if the --tod option was used with
perf-record.

Was:

swapper-0 [000] 5329.151669: context-switches: ...
perf-5701 [001] 5329.151797: context-switches: ...
sleep-5702 [000] 5329.155550: context-switches: ...
swapper-0 [001] 5329.158074: context-switches: ...

Now:

20:07:14.903936 swapper-0 [000] 5329.151669: context-switches: ...
20:07:14.904065 perf-5701 [001] 5329.151797: context-switches: ...
20:07:14.907817 sleep-5702 [000] 5329.155550: context-switches: ...
20:07:14.910342 swapper-0 [001] 5329.158074: context-switches: ...

Similarly for tracepoints:

20:11:11.659801 perf-5714 [001] 5565.907534: sched_switch: ...
20:11:11.659812 swapper-0 [001] 5565.907544: sched_switch: ...
20:11:11.659828 ksoftirqd/1-9 [001] 5565.907560: sched_switch: ...
20:11:12.857334 swapper-0 [001] 5567.105067: sched_switch: ...

Signed-off-by: David Ahern <[email protected]>
---
tools/perf/builtin-script.c | 26 +++++++++++-
tools/perf/util/event.c | 95 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/event.h | 10 +++++
tools/perf/util/session.c | 12 ++++--
tools/perf/util/session.h | 2 +-
5 files changed, 139 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4f4c63b..427c257 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -21,6 +21,8 @@ static u64 nr_unordered;
extern const struct option record_options[];
static bool hide_unresolved;
static bool no_callchain;
+static struct timeval tv_ref;
+static u64 timestamp_ref;

static int default_start_script(const char *script __unused,
int argc __unused,
@@ -73,6 +75,7 @@ static int process_sample_event(union perf_event *event,
struct addr_location al;
struct thread *thread = perf_session__findnew(session, event->ip.pid);
const char *evname = NULL;
+ const char *tstr = NULL;
static bool check_raw = true;

if (thread == NULL) {
@@ -88,6 +91,16 @@ static int process_sample_event(union perf_event *event,
return -1;
}

+ if ((attr->type == PERF_TYPE_SOFTWARE) &&
+ (attr->config == PERF_COUNT_SW_REALTIME_CLOCK) &&
+ (perf_event__process_rclk(sample, session,
+ &tv_ref, &timestamp_ref) == 0))
+ return 0;
+
+
+ if (session->sample_type & PERF_SAMPLE_TIME)
+ tstr = perf_event__timestr(sample, session, timestamp_ref, &tv_ref);
+
switch (attr->type) {
case PERF_TYPE_SOFTWARE:
if (perf_event__preprocess_sample(event, session, &al, sample,
@@ -96,7 +109,7 @@ static int process_sample_event(union perf_event *event,
event->header.type);
return -1;
}
- perf_session__print_sample(session, &al, sample, hide_unresolved);
+ perf_session__print_sample(session, &al, sample, hide_unresolved, tstr);
break;

case PERF_TYPE_TRACEPOINT:
@@ -116,6 +129,9 @@ static int process_sample_event(union perf_event *event,
last_timestamp = sample->time;
return 0;
}
+ if (tstr)
+ printf("%15s ", tstr);
+
/*
* FIXME: better resolve from pid from the struct trace_entry
* field, although it should be the same than this perf
@@ -137,6 +153,13 @@ static int process_sample_event(union perf_event *event,
return 0;
}

+static int process_reftime_event(union perf_event *event,
+ struct perf_session *session)
+{
+ return perf_event__process_reftime(event, session,
+ &tv_ref, &timestamp_ref);
+}
+
static struct perf_event_ops event_ops = {
.sample = process_sample_event,
.comm = perf_event__process_comm,
@@ -144,6 +167,7 @@ static struct perf_event_ops event_ops = {
.event_type = perf_event__process_event_type,
.tracing_data = perf_event__process_tracing_data,
.build_id = perf_event__process_build_id,
+ .reftime = process_reftime_event,
.ordered_samples = true,
.ordering_requires_timestamps = true,
};
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 6bbd551..ca1f56d 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1,4 +1,6 @@
#include <linux/types.h>
+#include <sys/time.h>
+#include <time.h>
#include "event.h"
#include "debug.h"
#include "session.h"
@@ -847,3 +849,96 @@ out_filtered:
al->filtered = true;
return 0;
}
+
+int perf_event__process_rclk(struct perf_sample *sample,
+ struct perf_session *session, struct timeval *tv,
+ u64 *ts)
+{
+ u64 s;
+
+ if (!(session->sample_type & PERF_SAMPLE_READ)) {
+ static bool show_msg = true;
+ if (show_msg) {
+ printf("rclk sample does not have event data.\n"
+ "Was record done with --timehist option?\n");
+ show_msg = false;
+ }
+ return -1;
+ }
+
+ if (!(session->sample_type & PERF_SAMPLE_TIME)) {
+ printf("rclk sample does not have kernel sample time\n");
+ return -1;
+ }
+
+ /* convert raw sample to wall-clock reference time */
+ s = sample->values.value;
+ tv->tv_sec = s / NSEC_PER_SEC;
+ tv->tv_usec = (s - tv->tv_sec * NSEC_PER_SEC) / 1000;
+
+ /* match event sample with kernel timestamp */
+ *ts= sample->time;
+
+ return 0;
+}
+
+/*
+ * process synthesized reference time event
+ */
+int perf_event__process_reftime(union perf_event *event,
+ struct perf_session *session __used,
+ struct timeval *tv, u64 *ts)
+{
+ *tv = event->reftime.tv;
+ *ts = event->reftime.nsec;
+
+ return 0;
+}
+
+const char *perf_event__timestr(struct perf_sample *sample,
+ struct perf_session *session,
+ u64 tsref, struct timeval *tvref)
+{
+ static char buf[64];
+ u64 dt;
+ struct tm ltime;
+ struct timeval tv_dt, tv_res;
+ const char *fmt = "%H:%M:%S";
+
+ buf[0] = '\0';
+
+ /* no sample time in sample data */
+ if (!(session->sample_type & PERF_SAMPLE_TIME))
+ return NULL;
+
+ if (tsref == 0) {
+ static bool show_msg = true;
+ if (show_msg) {
+ if (verbose)
+ warning("Reference clock event not seen yet; "
+ "Cannot generate wall-clock time strings until then.\n");
+ show_msg = false;
+ }
+ return NULL;
+ }
+
+ if (sample->time > tsref) {
+ dt = sample->time - tsref;
+ tv_dt.tv_sec = (time_t) (dt / NSEC_PER_SEC);
+ tv_dt.tv_usec = (dt - tv_dt.tv_sec * NSEC_PER_SEC) / 1000;
+ timeradd(tvref, &tv_dt, &tv_res);
+ } else {
+ dt = tsref - sample->time;
+ tv_dt.tv_sec = (time_t) (dt / NSEC_PER_SEC);
+ tv_dt.tv_usec = (dt - tv_dt.tv_sec * NSEC_PER_SEC) / 1000;
+ timersub(tvref, &tv_dt, &tv_res);
+ }
+
+ if (localtime_r(&tv_res.tv_sec, &ltime) != NULL) {
+ char date[64];
+ strftime(date, sizeof(date), fmt, &ltime);
+ snprintf(buf, sizeof(buf), "%s.%06ld", date, tv_res.tv_usec);
+ }
+
+ return buf;
+}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index d4810e0..22e2af8 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -196,4 +196,14 @@ const char *perf_event__name(unsigned int id);
int perf_event__parse_sample(const union perf_event *event, u64 type,
u64 read_format, bool sample_id_all, struct perf_sample *sample);

+int perf_event__process_rclk(struct perf_sample *sample,
+ struct perf_session *session, struct timeval *tv, u64 *ts);
+
+int perf_event__process_reftime(union perf_event *event,
+ struct perf_session *session, struct timeval *tv, u64 *ts);
+
+const char *perf_event__timestr(struct perf_sample *sample,
+ struct perf_session *session,
+ u64 tsref, struct timeval *tvref);
+
#endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d46351..b68fd4f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1157,8 +1157,12 @@ size_t perf_session__fprintf_dsos_buildid(struct perf_session *self, FILE *fp,

static inline void print_sample(const char *comm, pid_t pid,
u32 cpu, u64 secs, u64 usecs, const char *evname,
- u64 addr, const char *symname, const char *dsoname)
+ u64 addr, const char *symname, const char *dsoname,
+ const char *timestr)
{
+ if (timestr)
+ printf("%15s ", timestr);
+
printf("%16s-%-5d ", comm, pid);

if (cpu != (u32) -1)
@@ -1175,7 +1179,7 @@ static inline void print_sample(const char *comm, pid_t pid,
void perf_session__print_sample(struct perf_session *session,
struct addr_location *al,
struct perf_sample *sample,
- bool hide_unresolved)
+ bool hide_unresolved, const char *timestr)
{
struct callchain_cursor_node *node, *prev;
struct perf_event_attr *attr;
@@ -1228,7 +1232,7 @@ void perf_session__print_sample(struct perf_session *session,
continue;

print_sample(comm, al->thread->pid, cpu, secs, usecs,
- evname, node->ip, symname, dsoname);
+ evname, node->ip, symname, dsoname, timestr);

prev = node;
node = node->next;
@@ -1244,6 +1248,6 @@ void perf_session__print_sample(struct perf_session *session,
dsoname = al->map->dso->name;

print_sample(comm, al->thread->pid, cpu, secs, usecs,
- evname, al->addr, symname, dsoname);
+ evname, al->addr, symname, dsoname, timestr);
}
}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index ff8791d..a4f1c4f 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -172,5 +172,5 @@ static inline int perf_session__parse_sample(struct perf_session *session,

void perf_session__print_sample(struct perf_session *session,
struct addr_location *al, struct perf_sample *sample,
- bool hide_unresolved);
+ bool hide_unresolved, const char *timestr);
#endif /* __PERF_SESSION_H */
--
1.7.4

2011-02-28 03:52:55

by David Ahern

[permalink] [raw]
Subject: [PATCH 4/6] perf script: dump software events too

Call chain options mirror perf-report. Line format follows
tracepoints precedent:

comm-pid [cpu] secs.usecs: event: IP symbol dso

Example:
perf record -v -ga -e cs -c 1 -- sleep 5
perf script

(Line lengths wrapped):

sshd-794 [000] 2572.863440: context-switches: ffffffff810355de \
perf_event_task_sched_out ([kernel.kallsyms])
sshd-794 [000] 2572.863440: context-switches: ffffffff81382b01 \
schedule ([kernel.kallsyms])
sshd-794 [000] 2572.863440: context-switches: ffffffff8138380a \
schedule_hrtimeout_range_clock ([kernel.kallsyms])
sshd-794 [000] 2572.863440: context-switches: ffffffff813838e6 \
schedule_hrtimeout_range ([kernel.kallsyms])
sshd-794 [000] 2572.863440: context-switches: ffffffff8110c55d \
poll_schedule_timeout ([kernel.kallsyms])
sshd-794 [000] 2572.863440: context-switches: ffffffff8110cd84 \
do_select ([kernel.kallsyms])

or 'perf script -G'

swapper-0 [001] 2572.863188: context-switches: ffffffff810355de ...
sshd-794 [000] 2572.863440: context-switches: ffffffff810355de ...
kworker/0:1-10 [000] 2572.863451: context-switches: ffffffff810355de ...

Signed-off-by: David Ahern <[email protected]>
---
tools/perf/builtin-script.c | 65 +++++++++++++++++++++++++++--
tools/perf/util/session.c | 94 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/session.h | 3 +
3 files changed, 157 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 5f40df6..4f4c63b 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -19,6 +19,8 @@ static bool debug_mode;
static u64 last_timestamp;
static u64 nr_unordered;
extern const struct option record_options[];
+static bool hide_unresolved;
+static bool no_callchain;

static int default_start_script(const char *script __unused,
int argc __unused,
@@ -67,7 +69,11 @@ static int process_sample_event(union perf_event *event,
struct perf_sample *sample,
struct perf_session *session)
{
+ struct perf_event_attr *attr;
+ struct addr_location al;
struct thread *thread = perf_session__findnew(session, event->ip.pid);
+ const char *evname = NULL;
+ static bool check_raw = true;

if (thread == NULL) {
pr_debug("problem processing %d event, skipping it.\n",
@@ -75,7 +81,31 @@ static int process_sample_event(union perf_event *event,
return -1;
}

- if (session->sample_type & PERF_SAMPLE_RAW) {
+ attr = perf_header__find_attr(sample->id, &session->header);
+ if (!attr) {
+ pr_debug("No attributes found for sample id %" PRId64 ". "
+ "skipping it.\n", sample->id);
+ return -1;
+ }
+
+ switch (attr->type) {
+ case PERF_TYPE_SOFTWARE:
+ if (perf_event__preprocess_sample(event, session, &al, sample,
+ NULL) < 0) {
+ fprintf(stderr, "problem processing %d event, skipping it.\n",
+ event->header.type);
+ return -1;
+ }
+ perf_session__print_sample(session, &al, sample, hide_unresolved);
+ break;
+
+ case PERF_TYPE_TRACEPOINT:
+ if (check_raw) {
+ if (!perf_session__has_traces(session, "record -R"))
+ return -EINVAL;
+ check_raw = false;
+ }
+
if (debug_mode) {
if (sample->time < last_timestamp) {
pr_err("Samples misordered, previous: %" PRIu64
@@ -94,6 +124,13 @@ static int process_sample_event(union perf_event *event,
scripting_ops->process_event(sample->cpu, sample->raw_data,
sample->raw_size,
sample->time, thread->comm);
+ break;
+
+ default:
+ evname = __event_name(attr->type, attr->config);
+ if (verbose)
+ warning("support for event %s not implemented. skipping\n",
+ evname ? evname : "(unknown)");
}

session->hists.stats.total_period += sample->period;
@@ -592,6 +629,16 @@ static const struct option options[] = {
"input file name"),
OPT_BOOLEAN('d', "debug-mode", &debug_mode,
"do various checks like samples ordering and lost events"),
+ OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
+ "file", "vmlinux pathname"),
+ OPT_STRING(0, "kallsyms", &symbol_conf.kallsyms_name,
+ "file", "kallsyms pathname"),
+ OPT_BOOLEAN('G', "hide-call-graph", &no_callchain,
+ "Do not display call chain"),
+ OPT_BOOLEAN('U', "hide-unresolved", &hide_unresolved,
+ "Only display entries resolved to a symbol"),
+ OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
+ "Look for files with symbols relative to this directory"),

OPT_END()
};
@@ -735,6 +782,18 @@ int cmd_script(int argc, const char **argv, const char *prefix __used)
exit(-1);
}

+ if (no_callchain)
+ symbol_conf.use_callchain = false;
+
+ else {
+ symbol_conf.use_callchain = true;
+ if (callchain_register_param(&callchain_param) < 0) {
+ error("Can't register callchain params\n");
+ err = -EINVAL;
+ goto out;
+ }
+ }
+
if (rec_script_path)
script_path = rec_script_path;
if (rep_script_path)
@@ -772,10 +831,6 @@ int cmd_script(int argc, const char **argv, const char *prefix __used)
if (session == NULL)
return -ENOMEM;

- if (strcmp(input_name, "-") &&
- !perf_session__has_traces(session, "record -R"))
- return -EINVAL;
-
if (generate_script_lang) {
struct stat perf_stat;

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1ef8e8a..7d46351 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -10,6 +10,7 @@
#include "session.h"
#include "sort.h"
#include "util.h"
+#include "trace-event.h"

static int perf_session__open(struct perf_session *self, bool force)
{
@@ -1153,3 +1154,96 @@ size_t perf_session__fprintf_dsos_buildid(struct perf_session *self, FILE *fp,
size_t ret = machine__fprintf_dsos_buildid(&self->host_machine, fp, with_hits);
return ret + machines__fprintf_dsos_buildid(&self->machines, fp, with_hits);
}
+
+static inline void print_sample(const char *comm, pid_t pid,
+ u32 cpu, u64 secs, u64 usecs, const char *evname,
+ u64 addr, const char *symname, const char *dsoname)
+{
+ printf("%16s-%-5d ", comm, pid);
+
+ if (cpu != (u32) -1)
+ printf("[%03d]", cpu);
+
+ printf(" %5lu.%06lu: %s: ", secs, usecs, evname);
+
+ printf("%16" PRIx64 " %s (%s)\n",
+ addr, symname, dsoname);
+
+ return;
+}
+
+void perf_session__print_sample(struct perf_session *session,
+ struct addr_location *al,
+ struct perf_sample *sample,
+ bool hide_unresolved)
+{
+ struct callchain_cursor_node *node, *prev;
+ struct perf_event_attr *attr;
+ const char *evname = NULL;
+ const char *symname = "", *dsoname = "";
+ const char *comm = al->thread->comm_set ? al->thread->comm : "-";
+ u32 cpu = -1;
+ u64 secs = 0, usecs = 0;
+
+ if (session->sample_type & PERF_SAMPLE_TIME) {
+ u64 nsecs = sample->time;
+ secs = nsecs / NSECS_PER_SEC;
+ nsecs -= secs * NSECS_PER_SEC;
+ usecs = nsecs / NSECS_PER_USEC;
+ }
+
+ attr = perf_header__find_attr(sample->id, &session->header);
+ if (attr)
+ evname = __event_name(attr->type, attr->config);
+
+ if (!evname)
+ evname = "(unknown)";
+
+ if (attr && (attr->sample_type & PERF_SAMPLE_CPU))
+ cpu = sample->cpu;
+
+ if (symbol_conf.use_callchain && sample->callchain) {
+
+ if (perf_session__resolve_callchain(session, al->thread,
+ sample->callchain, NULL) != 0) {
+ if (verbose)
+ error("Failed to resolve callchain. Skipping\n");
+ return;
+ }
+
+ node = session->callchain_cursor.first;
+ if (!node)
+ return;
+
+ while (node) {
+
+ if (node->sym && node->sym->name)
+ symname = node->sym->name;
+ else if (hide_unresolved)
+ continue;
+
+ if (node->map && node->map->dso && node->map->dso->name)
+ dsoname = node->map->dso->name;
+ else if (hide_unresolved)
+ continue;
+
+ print_sample(comm, al->thread->pid, cpu, secs, usecs,
+ evname, node->ip, symname, dsoname);
+
+ prev = node;
+ node = node->next;
+ }
+ /* put a spacer between samples when callchains are dumped */
+ printf("\n");
+
+ } else {
+ if (al->sym && al->sym->name)
+ symname = al->sym->name;
+
+ if (al->map && al->map->dso && al->map->dso->name)
+ dsoname = al->map->dso->name;
+
+ print_sample(comm, al->thread->pid, cpu, secs, usecs,
+ evname, al->addr, symname, dsoname);
+ }
+}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index b46672a..ff8791d 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -170,4 +170,7 @@ static inline int perf_session__parse_sample(struct perf_session *session,
sample);
}

+void perf_session__print_sample(struct perf_session *session,
+ struct addr_location *al, struct perf_sample *sample,
+ bool hide_unresolved);
#endif /* __PERF_SESSION_H */
--
1.7.4

2011-02-28 03:53:32

by David Ahern

[permalink] [raw]
Subject: [PATCH 2/6] perf events: plumbing for PERF_SAMPLE_READ and read_format

Signed-off-by: David Ahern <[email protected]>
---
tools/perf/builtin-record.c | 5 +++++
tools/perf/builtin-test.c | 3 ++-
tools/perf/util/event.h | 10 +++++++++-
tools/perf/util/evsel.c | 24 +++++++++++++++++++++---
tools/perf/util/header.c | 18 ++++++++++++++++++
tools/perf/util/header.h | 1 +
tools/perf/util/python.c | 3 ++-
tools/perf/util/session.c | 12 ++++++++++++
tools/perf/util/session.h | 6 +++++-
9 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index db4cd1e..e39883e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -41,6 +41,7 @@ enum write_mode_t {
static u64 user_interval = ULLONG_MAX;
static u64 default_interval = 0;
static u64 sample_type;
+static u64 read_format;

static unsigned int page_size;
static unsigned int mmap_pages = 128;
@@ -218,6 +219,9 @@ static void create_counter(struct perf_evsel *evsel, int cpu)

if (!sample_type)
sample_type = attr->sample_type;
+
+ if (!read_format)
+ read_format = attr->read_format;
}

static void config_attr(struct perf_evsel *evsel, struct perf_evlist *evlist)
@@ -595,6 +599,7 @@ static int __cmd_record(int argc, const char **argv)
open_counters(evsel_list);

perf_session__set_sample_type(session, sample_type);
+ perf_session__set_read_format(session, read_format);

/*
* perf_session__delete(session) will be called at atexit_header()
diff --git a/tools/perf/builtin-test.c b/tools/perf/builtin-test.c
index 1b2106c..a243d51 100644
--- a/tools/perf/builtin-test.c
+++ b/tools/perf/builtin-test.c
@@ -558,7 +558,8 @@ static int test__basic_mmap(void)
goto out_munmap;
}

- perf_event__parse_sample(event, attr.sample_type, false, &sample);
+ perf_event__parse_sample(event, attr.sample_type,
+ attr.read_format, false, &sample);
evsel = perf_evlist__id2evsel(evlist, sample.id);
if (evsel == NULL) {
pr_debug("event with id %" PRIu64
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 9c35170..512a1ca 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -61,6 +61,13 @@ struct sample_event {
u64 array[];
};

+struct read_format {
+ u64 value;
+ u64 time_enabled;
+ u64 time_running;
+ u64 id;
+};
+
struct perf_sample {
u64 ip;
u32 pid, tid;
@@ -69,6 +76,7 @@ struct perf_sample {
u64 id;
u64 stream_id;
u64 period;
+ struct read_format values;
u32 cpu;
u32 raw_size;
void *raw_data;
@@ -178,6 +186,6 @@ int perf_event__preprocess_sample(const union perf_event *self,
const char *perf_event__name(unsigned int id);

int perf_event__parse_sample(const union perf_event *event, u64 type,
- bool sample_id_all, struct perf_sample *sample);
+ u64 read_format, bool sample_id_all, struct perf_sample *sample);

#endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8083d51..98c8471 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -304,7 +304,7 @@ static int perf_event__parse_id_sample(const union perf_event *event, u64 type,
}

int perf_event__parse_sample(const union perf_event *event, u64 type,
- bool sample_id_all, struct perf_sample *data)
+ u64 read_format, bool sample_id_all, struct perf_sample *data)
{
const u64 *array;

@@ -364,8 +364,26 @@ int perf_event__parse_sample(const union perf_event *event, u64 type,
}

if (type & PERF_SAMPLE_READ) {
- fprintf(stderr, "PERF_SAMPLE_READ is unsuported for now\n");
- return -1;
+ if (read_format & PERF_FORMAT_GROUP) {
+ printf("READ_FORMAT_GROUP is unsuported for now\n");
+ return -1;
+ } else {
+ data->values.value = *array;
+ array++;
+
+ if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED) {
+ data->values.time_enabled = *array;
+ array++;
+ }
+ if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING) {
+ data->values.time_running = *array;
+ array++;
+ }
+ if (read_format & PERF_FORMAT_ID) {
+ data->values.id = *array;
+ array++;
+ }
+ }
}

if (type & PERF_SAMPLE_CALLCHAIN) {
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 72c124d..f48b0b1 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -951,6 +951,23 @@ u64 perf_header__sample_type(struct perf_header *header)
return type;
}

+u64 perf_header__read_format(struct perf_header *header)
+{
+ u64 read_format = 0;
+ int i;
+
+ for (i = 0; i < header->attrs; i++) {
+ struct perf_header_attr *attr = header->attr[i];
+
+ if (!read_format)
+ read_format = attr->attr.read_format;
+ else if (read_format != attr->attr.read_format)
+ die("non matching read_format");
+ }
+
+ return read_format;
+}
+
bool perf_header__sample_id_all(const struct perf_header *header)
{
bool value = false, first = true;
@@ -1079,6 +1096,7 @@ int perf_event__process_attr(union perf_event *event,
}

perf_session__update_sample_type(session);
+ perf_session__update_read_format(session);

return 0;
}
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index f042ceb..5d89e7f 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -84,6 +84,7 @@ void perf_header_attr__delete(struct perf_header_attr *self);
int perf_header_attr__add_id(struct perf_header_attr *self, u64 id);

u64 perf_header__sample_type(struct perf_header *header);
+u64 perf_header__read_format(struct perf_header *header);
bool perf_header__sample_id_all(const struct perf_header *header);
struct perf_event_attr *
perf_header__find_attr(u64 id, struct perf_header *header);
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 5317ef2..b67c25f 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -684,7 +684,8 @@ static PyObject *pyrf_evlist__read_on_cpu(struct pyrf_evlist *pevlist,
return PyErr_NoMemory();

first = list_entry(evlist->entries.next, struct perf_evsel, node);
- perf_event__parse_sample(event, first->attr.sample_type, sample_id_all,
+ perf_event__parse_sample(event, first->attr.sample_type,
+ first->attr.read_format, sample_id_all,
&pevent->sample);
return pyevent;
}
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index a3a871f..dc0235b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -110,6 +110,17 @@ void perf_session__update_sample_type(struct perf_session *self)
perf_session__id_header_size(self);
}

+void perf_session__set_read_format(struct perf_session *session,
+ u64 read_format)
+{
+ session->read_format = read_format;
+}
+
+void perf_session__update_read_format(struct perf_session *self)
+{
+ self->read_format = perf_header__read_format(&self->header);
+}
+
int perf_session__create_kernel_maps(struct perf_session *self)
{
int ret = machine__create_kernel_maps(&self->host_machine);
@@ -172,6 +183,7 @@ struct perf_session *perf_session__new(const char *filename, int mode,
}

perf_session__update_sample_type(self);
+ perf_session__update_read_format(self);

if (ops && ops->ordering_requires_timestamps &&
ops->ordered_samples && !self->sample_id_all) {
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 977b3a1..212f810 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -43,6 +43,7 @@ struct perf_session {
*/
struct hists hists;
u64 sample_type;
+ u64 read_format;
int fd;
bool fd_pipe;
bool repipe;
@@ -115,6 +116,8 @@ void perf_session__update_sample_type(struct perf_session *self);
void perf_session__set_sample_id_all(struct perf_session *session, bool value);
void perf_session__set_sample_type(struct perf_session *session, u64 type);
void perf_session__remove_thread(struct perf_session *self, struct thread *th);
+void perf_session__set_read_format(struct perf_session *session, u64 read_format);
+void perf_session__update_read_format(struct perf_session *self);

static inline
struct machine *perf_session__find_host_machine(struct perf_session *self)
@@ -162,7 +165,8 @@ static inline int perf_session__parse_sample(struct perf_session *session,
struct perf_sample *sample)
{
return perf_event__parse_sample(event, session->sample_type,
- session->sample_id_all, sample);
+ session->read_format, session->sample_id_all,
+ sample);
}

#endif /* __PERF_SESSION_H */
--
1.7.4

2011-02-28 03:55:19

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 0/6 v3] perf events: Add realtime clock event, time-of-day strings to script output



On 02/27/2011 08:52 PM, David Ahern wrote:
> A request was made to modify perf-trace and perf-script. From what I can see
> perf-trace does not exist yet and perf-script requires raw data which means
> it does not work for the primary use case that motivated this feature
> (perf record -e cs -1 -ag).

Please disregard the above; it is not relevant to this version.

David

2011-03-01 14:10:01

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/6] perf script: dump software events too

On Sun, Feb 27, 2011 at 08:52:29PM -0700, David Ahern wrote:
> Call chain options mirror perf-report. Line format follows
> tracepoints precedent:
>
> comm-pid [cpu] secs.usecs: event: IP symbol dso
>
> Example:
> perf record -v -ga -e cs -c 1 -- sleep 5
> perf script
>
> (Line lengths wrapped):
>
> sshd-794 [000] 2572.863440: context-switches: ffffffff810355de \
> perf_event_task_sched_out ([kernel.kallsyms])
> sshd-794 [000] 2572.863440: context-switches: ffffffff81382b01 \
> schedule ([kernel.kallsyms])
> sshd-794 [000] 2572.863440: context-switches: ffffffff8138380a \
> schedule_hrtimeout_range_clock ([kernel.kallsyms])
> sshd-794 [000] 2572.863440: context-switches: ffffffff813838e6 \
> schedule_hrtimeout_range ([kernel.kallsyms])
> sshd-794 [000] 2572.863440: context-switches: ffffffff8110c55d \
> poll_schedule_timeout ([kernel.kallsyms])
> sshd-794 [000] 2572.863440: context-switches: ffffffff8110cd84 \
> do_select ([kernel.kallsyms])
>
> or 'perf script -G'
>
> swapper-0 [001] 2572.863188: context-switches: ffffffff810355de ...
> sshd-794 [000] 2572.863440: context-switches: ffffffff810355de ...
> kworker/0:1-10 [000] 2572.863451: context-switches: ffffffff810355de ...
>
> Signed-off-by: David Ahern <[email protected]>
> ---
> tools/perf/builtin-script.c | 65 +++++++++++++++++++++++++++--
> tools/perf/util/session.c | 94 +++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/session.h | 3 +
> 3 files changed, 157 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 5f40df6..4f4c63b 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -19,6 +19,8 @@ static bool debug_mode;
> static u64 last_timestamp;
> static u64 nr_unordered;
> extern const struct option record_options[];
> +static bool hide_unresolved;
> +static bool no_callchain;
>
> static int default_start_script(const char *script __unused,
> int argc __unused,
> @@ -67,7 +69,11 @@ static int process_sample_event(union perf_event *event,
> struct perf_sample *sample,
> struct perf_session *session)
> {
> + struct perf_event_attr *attr;
> + struct addr_location al;
> struct thread *thread = perf_session__findnew(session, event->ip.pid);
> + const char *evname = NULL;
> + static bool check_raw = true;
>
> if (thread == NULL) {
> pr_debug("problem processing %d event, skipping it.\n",
> @@ -75,7 +81,31 @@ static int process_sample_event(union perf_event *event,
> return -1;
> }
>
> - if (session->sample_type & PERF_SAMPLE_RAW) {
> + attr = perf_header__find_attr(sample->id, &session->header);
> + if (!attr) {
> + pr_debug("No attributes found for sample id %" PRId64 ". "
> + "skipping it.\n", sample->id);
> + return -1;
> + }
> +
> + switch (attr->type) {
> + case PERF_TYPE_SOFTWARE:
> + if (perf_event__preprocess_sample(event, session, &al, sample,
> + NULL) < 0) {
> + fprintf(stderr, "problem processing %d event, skipping it.\n",
> + event->header.type);
> + return -1;
> + }
> + perf_session__print_sample(session, &al, sample, hide_unresolved);
> + break;

That's not on the right place. This is going to print the events even
when we launch scripts. We only want to print them when we launch
perf script without a script, which falls back to printing events
(look at default_scripting_ops).

Like I suggested you, it's better to extend struct
scripting_ops::process_event()
to take the whole event and let the handler decide what to do.
Python and Perl scripting engines can just extract the raw sample and
do like they did before.
But print_event(), which is the process_event() handler for the
default_scripting_ops,
can handle the new printing features.

That's also more flexible, this lets Python and Perl scripting engines
be extendable
to handle the rest of the event later.

2011-03-01 14:18:42

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 4/6] perf script: dump software events too



On 03/01/2011 07:09 AM, Frederic Weisbecker wrote:
>
> That's not on the right place. This is going to print the events even
> when we launch scripts. We only want to print them when we launch
> perf script without a script, which falls back to printing events
> (look at default_scripting_ops).
>
> Like I suggested you, it's better to extend struct
> scripting_ops::process_event()
> to take the whole event and let the handler decide what to do.
> Python and Perl scripting engines can just extract the raw sample and
> do like they did before.
> But print_event(), which is the process_event() handler for the
> default_scripting_ops,
> can handle the new printing features.
>
> That's also more flexible, this lets Python and Perl scripting engines
> be extendable
> to handle the rest of the event later.

I don't follow what you mean.

This is the same place that tracepoints are dumped. In the
PERF_TYPE_TRACEPOINT case process_event == print_event which dumps the
lines to stdout.

I came in added a switch that invokes a different handler for software
events. Software events will be processed differently than tracepoint.
The print_event function and its pretty_print function are not designed
to dump software events (and eventually hardware events).

David

2011-03-01 14:27:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Sun, 2011-02-27 at 20:52 -0700, David Ahern wrote:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 51a2f34..404b1ee 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -240,6 +240,7 @@ struct perf_event_attr {
> #define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64)
> #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5)
> #define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *)
> +#define PERF_EVENT_IOC_RECORD_SAMPLE _IO('$', 7)
>
> enum perf_event_ioc_flags {
> PERF_IOC_FLAG_GROUP = 1U << 0,
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index a25a63d..7999f55 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -3250,6 +3250,7 @@ static struct perf_event *perf_fget_light(int fd, int *fput_needed)
> static int perf_event_set_output(struct perf_event *event,
> struct perf_event *output_event);
> static int perf_event_set_filter(struct perf_event *event, void __user *arg);
> +static int perf_event_generate_sample(struct perf_event *event);
>
> static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> {
> @@ -3296,6 +3297,9 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> case PERF_EVENT_IOC_SET_FILTER:
> return perf_event_set_filter(event, (void __user *)arg);
>
> + case PERF_EVENT_IOC_RECORD_SAMPLE:
> + return perf_event_generate_sample(event);
> +
> default:
> return -ENOTTY;
> }
> @@ -4398,6 +4402,21 @@ exit:
> rcu_read_unlock();
> }
>
> +/* add a sample to the event stream based on user request */
> +static int perf_event_generate_sample(struct perf_event *event)
> +{
> + struct perf_sample_data data;
> + struct pt_regs regs;
> +
> + perf_fetch_caller_regs(&regs);
> + event->pmu->read(event);
> + perf_sample_data_init(&data, 0);
> + data.period = event->hw.last_period;
> + perf_event_output(event, 0, &data, &regs);
> +
> + return 0;
> +}

I'm not quite sure why you need this..

2011-03-01 14:35:22

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/01/2011 07:29 AM, Peter Zijlstra wrote:

>> +/* add a sample to the event stream based on user request */
>> +static int perf_event_generate_sample(struct perf_event *event)
>> +{
>> + struct perf_sample_data data;
>> + struct pt_regs regs;
>> +
>> + perf_fetch_caller_regs(&regs);
>> + event->pmu->read(event);
>> + perf_sample_data_init(&data, 0);
>> + data.period = event->hw.last_period;
>> + perf_event_output(event, 0, &data, &regs);
>> +
>> + return 0;
>> +}
>
> I'm not quite sure why you need this..
>

periodic samples are generated after the specified period. I want the
first sample immediately and the remainder based on the set period.

So generically, how do you get the first sample into the event stream
and let the others happen based on the timer? I need not only the
counter value but also the perf_clock timestamp that comes from the
SAMPLE_TIME attribute.

David

2011-03-01 15:11:44

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/6] perf script: dump software events too

2011/3/1 David Ahern <[email protected]>:
> I don't follow what you mean.
>
> This is the same place that tracepoints are dumped. In the
> PERF_TYPE_TRACEPOINT case process_event == print_event which dumps the
> lines to stdout.

No it's not. scripting_ops can be either default_scripting_ops,
perl_scripting_ops
or python_scripting_ops.

Hence process_sample_event() doesn't necessarily ends up pretty print events.
It can also invoke a script that may decide to do what it wants with events.

pretty printing must be an endpoint decision.

> I came in added a switch that invokes a different handler for software
> events. Software events will be processed differently than tracepoint.
> The print_event function and its pretty_print function are not designed
> to dump software events (and eventually hardware events).

Why shouldn't it be designed to dump software events? It's called print_event().
Its current version is rather something I would call "limited". But it
was not designed
to be limited.

Ideally, we should have print_tracepoint_event() in
trace-event-parse.c, print_software_event()
where you want, and have print_event() in builtin-script.c that wraps on those.

2011-03-01 15:33:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Tue, 2011-03-01 at 07:35 -0700, David Ahern wrote:
>
> On 03/01/2011 07:29 AM, Peter Zijlstra wrote:
>
> >> +/* add a sample to the event stream based on user request */
> >> +static int perf_event_generate_sample(struct perf_event *event)
> >> +{
> >> + struct perf_sample_data data;
> >> + struct pt_regs regs;
> >> +
> >> + perf_fetch_caller_regs(&regs);
> >> + event->pmu->read(event);
> >> + perf_sample_data_init(&data, 0);
> >> + data.period = event->hw.last_period;
> >> + perf_event_output(event, 0, &data, &regs);
> >> +
> >> + return 0;
> >> +}
> >
> > I'm not quite sure why you need this..
> >
>
> periodic samples are generated after the specified period. I want the
> first sample immediately and the remainder based on the set period.
>
> So generically, how do you get the first sample into the event stream
> and let the others happen based on the timer? I need not only the
> counter value but also the perf_clock timestamp that comes from the
> SAMPLE_TIME attribute.

Again, why?

2011-03-01 15:41:44

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/01/2011 08:35 AM, Peter Zijlstra wrote:
> On Tue, 2011-03-01 at 07:35 -0700, David Ahern wrote:
>>
>> On 03/01/2011 07:29 AM, Peter Zijlstra wrote:
>>
>>>> +/* add a sample to the event stream based on user request */
>>>> +static int perf_event_generate_sample(struct perf_event *event)
>>>> +{
>>>> + struct perf_sample_data data;
>>>> + struct pt_regs regs;
>>>> +
>>>> + perf_fetch_caller_regs(&regs);
>>>> + event->pmu->read(event);
>>>> + perf_sample_data_init(&data, 0);
>>>> + data.period = event->hw.last_period;
>>>> + perf_event_output(event, 0, &data, &regs);
>>>> +
>>>> + return 0;
>>>> +}
>>>
>>> I'm not quite sure why you need this..
>>>
>>
>> periodic samples are generated after the specified period. I want the
>> first sample immediately and the remainder based on the set period.
>>
>> So generically, how do you get the first sample into the event stream
>> and let the others happen based on the timer? I need not only the
>> counter value but also the perf_clock timestamp that comes from the
>> SAMPLE_TIME attribute.
>
> Again, why?
>

realtime-clock counter is created with a period N. How do you correlate
perf_clock to time-of-day until now + N? You need a sample from the
beginning of the event stream, not N nanoseconds into it.

David

2011-03-01 15:59:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Tue, 2011-03-01 at 08:41 -0700, David Ahern wrote:

> realtime-clock counter is created with a period N. How do you correlate
> perf_clock to time-of-day until now + N?

You scan fwd? Since its at most N time ahead you're scanning fwd its
still constant time.

Thing is, I'm really not convinced this is a very useful ioctl() in
generic, and ioctl()s really need a good justification to add.

2011-03-01 16:09:34

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/01/2011 09:00 AM, Peter Zijlstra wrote:
> On Tue, 2011-03-01 at 08:41 -0700, David Ahern wrote:
>
>> realtime-clock counter is created with a period N. How do you correlate
>> perf_clock to time-of-day until now + N?
>
> You scan fwd? Since its at most N time ahead you're scanning fwd its
> still constant time.

So I potentially scan a multi-gigabyte file looking for a single event
-- that may not even exist. From a usability perspective the time string
should appear if the clock event exists and not if it does not exist.
This should happen automatically.

>
> Thing is, I'm really not convinced this is a very useful ioctl() in
> generic, and ioctl()s really need a good justification to add.
>

I did not code it create samples of a realtime-clock event; I coded it
to add an event to the stream on demand -- whatever event the fd is tied to.

David

2011-03-01 16:11:08

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 4/6] perf script: dump software events too



On 03/01/2011 08:11 AM, Frederic Weisbecker wrote:
>
> Why shouldn't it be designed to dump software events? It's called print_event().
> Its current version is rather something I would call "limited". But it
> was not designed
> to be limited.

That's because its origins are trace specific. Per last week's thread,
perf-script was perf-trace until Nov 2010. perf-script deals with
tracepoints.

>
> Ideally, we should have print_tracepoint_event() in
> trace-event-parse.c, print_software_event()
> where you want, and have print_event() in builtin-script.c that wraps on those.


process_event does not take the event sample, it takes elements of it:

struct scripting_ops {
...
void (*process_event) (int cpu, void *data, int size,
unsigned long long nsecs, char *comm);
...
};


David

2011-03-01 16:28:41

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 4/6] perf script: dump software events too

2011/3/1 David Ahern <[email protected]>:
>
>
> On 03/01/2011 08:11 AM, Frederic Weisbecker wrote:
>>
>> Why shouldn't it be designed to dump software events? It's called print_event().
>> Its current version is rather something I would call "limited". But it
>> was not designed
>> to be limited.
>
> That's because its origins are trace specific. Per last week's thread,
> perf-script was perf-trace until Nov 2010. perf-script deals with
> tracepoints.

Right it has been first created to support tracing events. Now why
should it stay limited to them?

>> Ideally, we should have print_tracepoint_event() in
>> trace-event-parse.c, print_software_event()
>> where you want, and have print_event() in builtin-script.c that wraps on those.
>
>
> process_event does not take the event sample, it takes elements of it:
>
> struct scripting_ops {
> ...
> ? ?void (*process_event) (int cpu, void *data, int size,
> ? ? ? ? ? ? ? ? ? unsigned long long nsecs, char *comm);
> ...
> };

What is the problem with changing a function prototype?

2011-03-01 16:35:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Tue, 2011-03-01 at 09:09 -0700, David Ahern wrote:
>
> I did not code it create samples of a realtime-clock event; I coded it
> to add an event to the stream on demand -- whatever event the fd is
> tied to.
>
>
Well yes, but aside from you and the realtime clock stuff, who'd ever
want to do this particular thing?

I'd much rather expose the whole
local_clock()/perf_clock()/trace_clock() (should all be the same anyway)
as a posix clock using CLOCK_TRACING or whatever would be an appropriate
name.

[ Since the whole thing is NMI safe it should be well possible to make a
VDSO version as well. ]

Anyway, once its visible as a posix clock you can sync up from
userspace. And this clock is indeed wanted for other things too, like
user-space tracing etc.

2011-03-01 16:45:14

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/01/2011 09:37 AM, Peter Zijlstra wrote:
> On Tue, 2011-03-01 at 09:09 -0700, David Ahern wrote:
>>
>> I did not code it create samples of a realtime-clock event; I coded it
>> to add an event to the stream on demand -- whatever event the fd is
>> tied to.
>>
>>
> Well yes, but aside from you and the realtime clock stuff, who'd ever
> want to do this particular thing?

One example would be a system watchdog that decided for some reason or
another to force an event sample because of some event it detected.

>
> I'd much rather expose the whole
> local_clock()/perf_clock()/trace_clock() (should all be the same anyway)
> as a posix clock using CLOCK_TRACING or whatever would be an appropriate
> name.
>
> [ Since the whole thing is NMI safe it should be well possible to make a
> VDSO version as well. ]
>
> Anyway, once its visible as a posix clock you can sync up from
> userspace. And this clock is indeed wanted for other things too, like
> user-space tracing etc.
>

And for some silly reason I sense a time trap here (pun intended) .....

So your pushback is:
1. throw out the realtime-clock event patch,

2. add a new CLOCK_TRACING type to clock_gettime with VDSO hook,

3. and use a synthesized event from userspace for snapshotting
perf_clock to time-of-day -- similar to what I have now, but one that
will guarantee a time-of-day to perf_clock correlation (versus the
current one which hopes that perf_clock is the monotonic clock).

David

2011-03-01 16:49:36

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 4/6] perf script: dump software events too

Em Tue, Mar 01, 2011 at 09:11:20AM -0700, David Ahern escreveu:
> On 03/01/2011 08:11 AM, Frederic Weisbecker wrote:
> > Why shouldn't it be designed to dump software events? It's called print_event().
> > Its current version is rather something I would call "limited". But it
> > was not designed to be limited.

> That's because its origins are trace specific. Per last week's thread,
> perf-script was perf-trace until Nov 2010. perf-script deals with
> tracepoints.

> > Ideally, we should have print_tracepoint_event() in
> > trace-event-parse.c, print_software_event()
> > where you want, and have print_event() in builtin-script.c that wraps on those.
>
> process_event does not take the event sample, it takes elements of it:
>
> struct scripting_ops {
> ...
> void (*process_event) (int cpu, void *data, int size,
> unsigned long long nsecs, char *comm);

I still have to read the entire thread, but this definetely should
receive "struct perf_sample", i.e. the parsed sample and the 'struct
thread' from where it can get the comm, etc

Look at the process_event for builtin-script.c:

scripting_ops->process_event(sample->cpu,
sample->raw_data,
sample->raw_size,
sample->time,
thread->comm);


I'd change it to:

scripting_ops->process_event(event, sample, thread);

Or even short circuit it completely modulo debugging, becoming:

static int process_sample_event(union perf_event *event,
struct perf_sample *sample,
struct perf_session *session)
{
if (debug_mode) {
if (sample->time < last_timestamp) {
pr_err("Samples misordered, previous: %" PRIu64
" this: %" PRIu64 "\n", last_timestamp,
sample->time);
nr_unordered++;
}
last_timestamp = sample->time;
} else {
scripting_ops->process_event(event, sample, session);
session->hists.stats.total_period += sample->period;
}
return 0;
}

Script modules may then do their thread accounting if comms are asked
for, etc.

- Arnaldo

2011-03-01 17:08:00

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

Em Tue, Mar 01, 2011 at 09:45:27AM -0700, David Ahern escreveu:
>
>
> On 03/01/2011 09:37 AM, Peter Zijlstra wrote:
> > On Tue, 2011-03-01 at 09:09 -0700, David Ahern wrote:
> >>
> >> I did not code it create samples of a realtime-clock event; I coded it
> >> to add an event to the stream on demand -- whatever event the fd is
> >> tied to.
> >>
> >>
> > Well yes, but aside from you and the realtime clock stuff, who'd ever
> > want to do this particular thing?
>
> One example would be a system watchdog that decided for some reason or
> another to force an event sample because of some event it detected.
>
> >
> > I'd much rather expose the whole
> > local_clock()/perf_clock()/trace_clock() (should all be the same anyway)
> > as a posix clock using CLOCK_TRACING or whatever would be an appropriate
> > name.
> >
> > [ Since the whole thing is NMI safe it should be well possible to make a
> > VDSO version as well. ]
> >
> > Anyway, once its visible as a posix clock you can sync up from
> > userspace. And this clock is indeed wanted for other things too, like
> > user-space tracing etc.
> >
>
> And for some silly reason I sense a time trap here (pun intended) .....
>
> So your pushback is:
> 1. throw out the realtime-clock event patch,

Nope

> 2. add a new CLOCK_TRACING type to clock_gettime with VDSO hook,

Yes

> 3. and use a synthesized event from userspace for snapshotting
> perf_clock to time-of-day -- similar to what I have now, but one that
> will guarantee a time-of-day to perf_clock correlation (versus the
> current one which hopes that perf_clock is the monotonic clock).

Nope.

My understanding was that for the initial gettime you use
clock_gettime(), and then you get CLOCK_REALTIME events in the stream
for resynch, etc.

Peter?

- Arnaldo

2011-03-01 17:09:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

Em Tue, Mar 01, 2011 at 02:07:50PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Mar 01, 2011 at 09:45:27AM -0700, David Ahern escreveu:
> > On 03/01/2011 09:37 AM, Peter Zijlstra wrote:
> > > On Tue, 2011-03-01 at 09:09 -0700, David Ahern wrote:
> > >>
> > >> I did not code it create samples of a realtime-clock event; I coded it
> > >> to add an event to the stream on demand -- whatever event the fd is
> > >> tied to.
> > >>
> > >>
> > > Well yes, but aside from you and the realtime clock stuff, who'd ever
> > > want to do this particular thing?
> >
> > One example would be a system watchdog that decided for some reason or
> > another to force an event sample because of some event it detected.
> >
> > >
> > > I'd much rather expose the whole
> > > local_clock()/perf_clock()/trace_clock() (should all be the same anyway)
> > > as a posix clock using CLOCK_TRACING or whatever would be an appropriate
> > > name.
> > >
> > > [ Since the whole thing is NMI safe it should be well possible to make a
> > > VDSO version as well. ]
> > >
> > > Anyway, once its visible as a posix clock you can sync up from
> > > userspace. And this clock is indeed wanted for other things too, like
> > > user-space tracing etc.
> > >
> >
> > And for some silly reason I sense a time trap here (pun intended) .....
> >
> > So your pushback is:
> > 1. throw out the realtime-clock event patch,
>
> Nope
>
> > 2. add a new CLOCK_TRACING type to clock_gettime with VDSO hook,
>
> Yes
>
> > 3. and use a synthesized event from userspace for snapshotting
> > perf_clock to time-of-day -- similar to what I have now, but one that
> > will guarantee a time-of-day to perf_clock correlation (versus the
> > current one which hopes that perf_clock is the monotonic clock).
>
> Nope.

I see, yeah, the very first CLOCK_REALTIME sample would be synthesized
from the return of clock_gettime(CLOCK_TRACING), all the others would
come in the event stream.

> My understanding was that for the initial gettime you use
> clock_gettime(), and then you get CLOCK_REALTIME events in the stream
> for resynch, etc.
>
> Peter?
>
> - Arnaldo

2011-03-01 22:26:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Tue, 2011-03-01 at 09:45 -0700, David Ahern wrote:

> One example would be a system watchdog that decided for some reason or
> another to force an event sample because of some event it detected.

That would be a design error, there already is an event, use that.

> >
> > I'd much rather expose the whole
> > local_clock()/perf_clock()/trace_clock() (should all be the same anyway)
> > as a posix clock using CLOCK_TRACING or whatever would be an appropriate
> > name.
> >
> > [ Since the whole thing is NMI safe it should be well possible to make a
> > VDSO version as well. ]
> >
> > Anyway, once its visible as a posix clock you can sync up from
> > userspace. And this clock is indeed wanted for other things too, like
> > user-space tracing etc.
> >
>
> And for some silly reason I sense a time trap here (pun intended) .....
>
> So your pushback is:
> 1. throw out the realtime-clock event patch,

Well, no, you need continues samples to keep in sync, so having this is
required [*].

> 2. add a new CLOCK_TRACING type to clock_gettime with VDSO hook,

Right, that is something we need anyway at some point in time.

> 3. and use a synthesized event from userspace for snapshotting
> perf_clock to time-of-day -- similar to what I have now, but one that
> will guarantee a time-of-day to perf_clock correlation (versus the
> current one which hopes that perf_clock is the monotonic clock).

I guess you could do that if we have full userspace tracing support, but
it wouldn't need synthesized events, it would need a pure userspace
event stream.

Thing is, I really dislike the ioctl() trigger


2011-03-01 22:35:23

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/01/2011 03:28 PM, Peter Zijlstra wrote:

>> And for some silly reason I sense a time trap here (pun intended) .....
>>
>> So your pushback is:
>> 1. throw out the realtime-clock event patch,
>
> Well, no, you need continues samples to keep in sync, so having this is
> required [*].
>
>> 2. add a new CLOCK_TRACING type to clock_gettime with VDSO hook,
>
> Right, that is something we need anyway at some point in time.
>
>> 3. and use a synthesized event from userspace for snapshotting
>> perf_clock to time-of-day -- similar to what I have now, but one that
>> will guarantee a time-of-day to perf_clock correlation (versus the
>> current one which hopes that perf_clock is the monotonic clock).
>
> I guess you could do that if we have full userspace tracing support, but
> it wouldn't need synthesized events, it would need a pure userspace
> event stream.


I still don't see how this solves the need to trigger the initial sample
into the event stream.

David

>
> Thing is, I really dislike the ioctl() trigger
>
>
>

2011-03-02 14:17:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Sun, 27 Feb 2011, David Ahern wrote:

> Enable data collection for generating time-of-day strings when
> printing individual perf samples. This is done by sampling the
> realtime clock event with the perf_clock time stamps.
>
> If the realtime-clock event is not available (e.g, older kernels)
> fallback to a synthesized event. (I realize there is resistance
> to new synthesized events, but it is a simple way to gain this
> feature on older kernels without the need to modify the kernel
> code).

...

> +/* add a sample to the event stream based on user request */
> +static int perf_event_generate_sample(struct perf_event *event)
> +{
> + struct perf_sample_data data;
> + struct pt_regs regs;
> +
> + perf_fetch_caller_regs(&regs);
> + event->pmu->read(event);
> + perf_sample_data_init(&data, 0);
> + data.period = event->hw.last_period;
> + perf_event_output(event, 0, &data, &regs);
> +
> + return 0;
> +}

Errm. This is a generic ioctl to inject random events from user space.

First of all this wants to be a separate patch and not burried into
some 100 lines of changes to tools/perf.

Secondly adding such an ioctl needs a pretty good reason, and the use
case at hand is not really one.

Reading through the other patches, your goal is to correlate
CLOCK_REALTIME based logs to CLOCK_PERF based events.

To achieve that you inject cyclic clock_realtime samples into the
event stream and want to have this initial event inserted via the
ioctl.

How does all that deal with CLOCK_REALTIME being affected by NTP and
settimeofday? Not really, as far as I can tell. It somehow works, but
that depends on the frequency of your event injection.

To be honest, that's just too much churn for a feature which is single
purpose and questionable functionality.

It would be far more interesting to have trace points in the
timekeeping code, which are useful for other things as well
(e.g. precise monitoring of NTP) and provide always a correct
association with the trace clock. That would cover everything from
frequency adjustments, clock setting including the information about
the relation to clock monotonic.

Thanks,

tglx

2011-03-02 14:28:37

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/02/2011 07:16 AM, Thomas Gleixner wrote:
> On Sun, 27 Feb 2011, David Ahern wrote:
>
>> Enable data collection for generating time-of-day strings when
>> printing individual perf samples. This is done by sampling the
>> realtime clock event with the perf_clock time stamps.
>>
>> If the realtime-clock event is not available (e.g, older kernels)
>> fallback to a synthesized event. (I realize there is resistance
>> to new synthesized events, but it is a simple way to gain this
>> feature on older kernels without the need to modify the kernel
>> code).
>
> ...
>
>> +/* add a sample to the event stream based on user request */
>> +static int perf_event_generate_sample(struct perf_event *event)
>> +{
>> + struct perf_sample_data data;
>> + struct pt_regs regs;
>> +
>> + perf_fetch_caller_regs(&regs);
>> + event->pmu->read(event);
>> + perf_sample_data_init(&data, 0);
>> + data.period = event->hw.last_period;
>> + perf_event_output(event, 0, &data, &regs);
>> +
>> + return 0;
>> +}
>
> Errm. This is a generic ioctl to inject random events from user space.
>
> First of all this wants to be a separate patch and not burried into
> some 100 lines of changes to tools/perf.
>
> Secondly adding such an ioctl needs a pretty good reason, and the use
> case at hand is not really one.

Peter has made it clear he objects to the ioctl.

>
> Reading through the other patches, your goal is to correlate
> CLOCK_REALTIME based logs to CLOCK_PERF based events.
>
> To achieve that you inject cyclic clock_realtime samples into the
> event stream and want to have this initial event inserted via the
> ioctl.

I still need a solution. There is no means for requesting a sample
immediately when a counter is opened, and there is a desire not to add
any more synthesized events. How then do I get the initial correlation
without scanning a potentially large file looking for a single event
type occurrence.

>
> How does all that deal with CLOCK_REALTIME being affected by NTP and
> settimeofday? Not really, as far as I can tell. It somehow works, but
> that depends on the frequency of your event injection.

It is sampled at some periodic rate to get NTP changes. Right now it is
hardcoded at once an hour. The frequency option can be added to the
--tod parameter.

>
> To be honest, that's just too much churn for a feature which is single
> purpose and questionable functionality.
>
> It would be far more interesting to have trace points in the
> timekeeping code, which are useful for other things as well
> (e.g. precise monitoring of NTP) and provide always a correct
> association with the trace clock. That would cover everything from
> frequency adjustments, clock setting including the information about
> the relation to clock monotonic.

I'm not looking to track ntp changes. I want time-of-day strings on
sample dumps. That is not a questionable feature request; it is a
usability enhancement.

We are users of perf. Fantastic tool, but of limited functionality to
the engineers on the team if they cannot correlate event samples with
application logs.

David


>
> Thanks,
>
> tglx

2011-03-02 17:28:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option

On Wed, 2 Mar 2011, David Ahern wrote:
> On 03/02/2011 07:16 AM, Thomas Gleixner wrote:
> > How does all that deal with CLOCK_REALTIME being affected by NTP and
> > settimeofday? Not really, as far as I can tell. It somehow works, but
> > that depends on the frequency of your event injection.
>
> It is sampled at some periodic rate to get NTP changes. Right now it is
> hardcoded at once an hour. The frequency option can be added to the
> --tod parameter.

Once an hour. Brilliant. And how do you correlated that when
settimeofday was called 5 times in between the samples. Not at all.

> >
> > To be honest, that's just too much churn for a feature which is single
> > purpose and questionable functionality.
> >
> > It would be far more interesting to have trace points in the
> > timekeeping code, which are useful for other things as well
> > (e.g. precise monitoring of NTP) and provide always a correct
> > association with the trace clock. That would cover everything from
> > frequency adjustments, clock setting including the information about
> > the relation to clock monotonic.
>
> I'm not looking to track ntp changes. I want time-of-day strings on
> sample dumps. That is not a questionable feature request; it is a
> usability enhancement.

Sigh, did you even read what I wrote?

I did not say, that YOU have to track NTP changes.

And I fully understood that you want to correlate perf time
stamps with CLOCK_REALTIME.

Neither did I say, that this is a questionable feature request.

I merily objected to add functionality which is half baken and only
particular useful.

Further I made a suggestion to add trace points to the time keeping
code instead, which

- solve your problem
- are useful for other purposes
- do not require ioctls, timer driven sw events

Thanks,

tglx

2011-03-03 08:52:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option


* David Ahern <[email protected]> wrote:

> > How does all that deal with CLOCK_REALTIME being affected by NTP and
> > settimeofday? Not really, as far as I can tell. It somehow works, but
> > that depends on the frequency of your event injection.
>
> It is sampled at some periodic rate to get NTP changes. Right now it is
> hardcoded at once an hour. The frequency option can be added to the
> --tod parameter.

As Thomas mentioned, we probably need something more complete than that.

Still your approach is obviously useful and i'd like to stress that explicitly. Have
you considered another related feature, feeding printk lines as special 'string
events' into the perf ringbuffer?

It would round up your scheme very nicely: that way you'd have a single, global,
GTOD-correlated event store/flow for basically every system event you might be
interested in. You could switch events (and tracepoints) on/off based on need,
controlling the type and rate of information in a very finegrained way.

If the printk approach works out i'd even suggest that such a facility would need a
separate tool within perf: 'perf syslog' or 'perf log'. It would heavily reuse
perf-script and other perf internal facilities, obviously - but you would not be
tied to any particular sub-tool implementation.

EDAC type events could feed into this as well, giving the tool a broader 'system
health' aspect as well, beyond the 'system performance analysis' aspect.

Thanks,

Ingo

2011-03-03 14:29:08

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/02/2011 10:28 AM, Thomas Gleixner wrote:
> Further I made a suggestion to add trace points to the time keeping
> code instead, which

Ok, so let's say we have a tracepoint in do_settimeofday, another in
whatever function is needed to get ntp updates to the timekeeper
parameters. I still don't see how the initial sample is grabbed. I
definitely do not want a tracepoint in any of the gettimeofday
functions. Furthermore, seems like timekeeping code now needs to be
replicated within perf to some degree.

How about a simpler alternative? Adding the time-of-day stamp to the
samples. --tod is an optional parameter, and if desired a warning can be
added to the usage about the additional overhead of reading gettimeofday
for each sample. Users can decided if the overhead is acceptable (this
is still much lighterweight than strace which has a time-of-day option).

This is the kernel side change; userspace changes are much simpler as well.

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8ceb5a6..3297394 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -125,8 +125,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_PERIOD = 1U << 8,
PERF_SAMPLE_STREAM_ID = 1U << 9,
PERF_SAMPLE_RAW = 1U << 10,
+ PERF_SAMPLE_REALTIME = 1U << 11,

- PERF_SAMPLE_MAX = 1U << 11, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 12, /* non-ABI */
};

/*
@@ -989,6 +990,7 @@ struct perf_sample_data {
u32 cpu;
u32 reserved;
} cpu_entry;
+ u64 realtime;
u64 period;
struct perf_callchain_entry *callchain;
struct perf_raw_record *raw;
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 64a018e..0711cc9 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -908,6 +908,9 @@ static void perf_event__id_header_size(struct
perf_event *event)
if (sample_type & PERF_SAMPLE_CPU)
size += sizeof(data->cpu_entry);

+ if (sample_type & PERF_SAMPLE_REALTIME)
+ size += sizeof(data->realtime);
+
event->id_header_size = size;
}

@@ -4010,6 +4013,9 @@ static void __perf_event_header__init_id(struct
perf_event_header *header,
if (sample_type & PERF_SAMPLE_TIME)
data->time = perf_clock();

+ if (sample_type & PERF_SAMPLE_REALTIME)
+ data->realtime = ktime_to_ns(ktime_get_real());
+
if (sample_type & PERF_SAMPLE_ID)
data->id = primary_event_id(event);

@@ -4049,6 +4055,9 @@ static void __perf_event__output_id_sample(struct
perf_output_handle *handle,

if (sample_type & PERF_SAMPLE_CPU)
perf_output_put(handle, data->cpu_entry);
+
+ if (sample_type & PERF_SAMPLE_REALTIME)
+ perf_output_put(handle, data->realtime);
}

static void perf_event__output_id_sample(struct perf_event *event,
@@ -4293,6 +4302,9 @@ void perf_output_sample(struct perf_output_handle
*handle,
if (sample_type & PERF_SAMPLE_CPU)
perf_output_put(handle, data->cpu_entry);

+ if (sample_type & PERF_SAMPLE_REALTIME)
+ perf_output_put(handle, data->realtime);
+
if (sample_type & PERF_SAMPLE_PERIOD)
perf_output_put(handle, data->period);

2011-03-03 14:33:08

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH 3/6] perf record: add time-of-day option



On 03/03/2011 01:51 AM, Ingo Molnar wrote:
>
> * David Ahern <[email protected]> wrote:
>
>>> How does all that deal with CLOCK_REALTIME being affected by NTP and
>>> settimeofday? Not really, as far as I can tell. It somehow works, but
>>> that depends on the frequency of your event injection.
>>
>> It is sampled at some periodic rate to get NTP changes. Right now it is
>> hardcoded at once an hour. The frequency option can be added to the
>> --tod parameter.
>
> As Thomas mentioned, we probably need something more complete than that.

Just responded to Thomas' email with a simpler proposal than munging on
tracepoints and trying to track time-of-day implicitly.

> Still your approach is obviously useful and i'd like to stress that explicitly. Have

Thanks for stating that.

> you considered another related feature, feeding printk lines as special 'string
> events' into the perf ringbuffer?
>
> It would round up your scheme very nicely: that way you'd have a single, global,
> GTOD-correlated event store/flow for basically every system event you might be
> interested in. You could switch events (and tracepoints) on/off based on need,
> controlling the type and rate of information in a very finegrained way.
>
> If the printk approach works out i'd even suggest that such a facility would need a
> separate tool within perf: 'perf syslog' or 'perf log'. It would heavily reuse
> perf-script and other perf internal facilities, obviously - but you would not be
> tied to any particular sub-tool implementation.
>
> EDAC type events could feed into this as well, giving the tool a broader 'system
> health' aspect as well, beyond the 'system performance analysis' aspect.

I can look into it (as time allows) once I get the dumping of software
events done.

David

>
> Thanks,
>
> Ingo