hi,
this patchset is based on changes made by David Ahern long time ago.
The perf code moved a lot since then, but the idea is the same.
The patchset is adding the ability to display TOD/wallclock timestamp
in 'perf script' output and in 'perf data convert --to-ctf' subcommand,
so the converted CTF data contain TOD/wallclock timestamps.
It's done by adding new header FEATURE 'CLOCK_DATA' to perf.data, that
stores reference times for both TOD/wallclock time and for the clock that
perf record is configured to use. These reference times are then used to
convert sample's timestamps to TOD/wallclock timestamps.
The feature is available only for recording with clockid specified,
because it's the only case where we can get reference time to TOD/wallclock
time. We can't do that with perf clock yet.
Also available in here:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/abs_time
thanks,
jirka
---
Jiri Olsa (6):
perf tools: Add clockid_name function
perf tools: Store clock references for -k/--clockid option
perf tools: Move clockid_res_ns under clock struct
perf tools: Add support to store time of day in CTF data conversion
perf script: Change enum perf_output_field values to be 64 bits
perf script: Add tod field to display time of day
tools/perf/Documentation/perf-data.txt | 3 ++
tools/perf/Documentation/perf.data-file-format.txt | 13 +++++++
tools/perf/builtin-data.c | 1 +
tools/perf/builtin-record.c | 58 ++++++++++++++++++++++++++++--
tools/perf/builtin-script.c | 195 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------
tools/perf/util/data-convert-bt.c | 56 ++++++++++++++++++-----------
tools/perf/util/data-convert.h | 1 +
tools/perf/util/env.h | 14 +++++++-
tools/perf/util/header.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
tools/perf/util/header.h | 1 +
tools/perf/util/util.h | 2 ++
11 files changed, 371 insertions(+), 93 deletions(-)
So it's possible to add new values. I did not find any place
where the enum values are passed through some number type,
so it's safe to make this change.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-script.c | 64 ++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 32 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 0dfc38fb6d35..eb45f678dc2f 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -82,38 +82,38 @@ static bool native_arch;
unsigned int scripting_max_stack = PERF_MAX_STACK_DEPTH;
enum perf_output_field {
- PERF_OUTPUT_COMM = 1U << 0,
- PERF_OUTPUT_TID = 1U << 1,
- PERF_OUTPUT_PID = 1U << 2,
- PERF_OUTPUT_TIME = 1U << 3,
- PERF_OUTPUT_CPU = 1U << 4,
- PERF_OUTPUT_EVNAME = 1U << 5,
- PERF_OUTPUT_TRACE = 1U << 6,
- PERF_OUTPUT_IP = 1U << 7,
- PERF_OUTPUT_SYM = 1U << 8,
- PERF_OUTPUT_DSO = 1U << 9,
- PERF_OUTPUT_ADDR = 1U << 10,
- PERF_OUTPUT_SYMOFFSET = 1U << 11,
- PERF_OUTPUT_SRCLINE = 1U << 12,
- PERF_OUTPUT_PERIOD = 1U << 13,
- PERF_OUTPUT_IREGS = 1U << 14,
- PERF_OUTPUT_BRSTACK = 1U << 15,
- PERF_OUTPUT_BRSTACKSYM = 1U << 16,
- PERF_OUTPUT_DATA_SRC = 1U << 17,
- PERF_OUTPUT_WEIGHT = 1U << 18,
- PERF_OUTPUT_BPF_OUTPUT = 1U << 19,
- PERF_OUTPUT_CALLINDENT = 1U << 20,
- PERF_OUTPUT_INSN = 1U << 21,
- PERF_OUTPUT_INSNLEN = 1U << 22,
- PERF_OUTPUT_BRSTACKINSN = 1U << 23,
- PERF_OUTPUT_BRSTACKOFF = 1U << 24,
- PERF_OUTPUT_SYNTH = 1U << 25,
- PERF_OUTPUT_PHYS_ADDR = 1U << 26,
- PERF_OUTPUT_UREGS = 1U << 27,
- PERF_OUTPUT_METRIC = 1U << 28,
- PERF_OUTPUT_MISC = 1U << 29,
- PERF_OUTPUT_SRCCODE = 1U << 30,
- PERF_OUTPUT_IPC = 1U << 31,
+ PERF_OUTPUT_COMM = 1ULL << 0,
+ PERF_OUTPUT_TID = 1ULL << 1,
+ PERF_OUTPUT_PID = 1ULL << 2,
+ PERF_OUTPUT_TIME = 1ULL << 3,
+ PERF_OUTPUT_CPU = 1ULL << 4,
+ PERF_OUTPUT_EVNAME = 1ULL << 5,
+ PERF_OUTPUT_TRACE = 1ULL << 6,
+ PERF_OUTPUT_IP = 1ULL << 7,
+ PERF_OUTPUT_SYM = 1ULL << 8,
+ PERF_OUTPUT_DSO = 1ULL << 9,
+ PERF_OUTPUT_ADDR = 1ULL << 10,
+ PERF_OUTPUT_SYMOFFSET = 1ULL << 11,
+ PERF_OUTPUT_SRCLINE = 1ULL << 12,
+ PERF_OUTPUT_PERIOD = 1ULL << 13,
+ PERF_OUTPUT_IREGS = 1ULL << 14,
+ PERF_OUTPUT_BRSTACK = 1ULL << 15,
+ PERF_OUTPUT_BRSTACKSYM = 1ULL << 16,
+ PERF_OUTPUT_DATA_SRC = 1ULL << 17,
+ PERF_OUTPUT_WEIGHT = 1ULL << 18,
+ PERF_OUTPUT_BPF_OUTPUT = 1ULL << 19,
+ PERF_OUTPUT_CALLINDENT = 1ULL << 20,
+ PERF_OUTPUT_INSN = 1ULL << 21,
+ PERF_OUTPUT_INSNLEN = 1ULL << 22,
+ PERF_OUTPUT_BRSTACKINSN = 1ULL << 23,
+ PERF_OUTPUT_BRSTACKOFF = 1ULL << 24,
+ PERF_OUTPUT_SYNTH = 1ULL << 25,
+ PERF_OUTPUT_PHYS_ADDR = 1ULL << 26,
+ PERF_OUTPUT_UREGS = 1ULL << 27,
+ PERF_OUTPUT_METRIC = 1ULL << 28,
+ PERF_OUTPUT_MISC = 1ULL << 29,
+ PERF_OUTPUT_SRCCODE = 1ULL << 30,
+ PERF_OUTPUT_IPC = 1ULL << 31,
};
struct output_option {
--
2.25.4
Adding tod field to display time of day column with
time of date (wallclock) time.
# perf record -k CLOCK_MONOTONIC kill
kill: not enough arguments
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.033 MB perf.data (8 samples) ]
# perf script
perf 261340 152919.481538: 1 cycles: ffffffff8106d104 ...
perf 261340 152919.481543: 1 cycles: ffffffff8106d104 ...
perf 261340 152919.481545: 7 cycles: ffffffff8106d104 ...
...
# perf script --ns
perf 261340 152919.481538922: 1 cycles: ffffffff8106d ...
perf 261340 152919.481543286: 1 cycles: ffffffff8106d ...
perf 261340 152919.481545397: 7 cycles: ffffffff8106d ...
...
# perf script -F+tod
perf 261340 2020-07-13 18:26:55.620971 152919.481538: ...
perf 261340 2020-07-13 18:26:55.620975 152919.481543: ...
perf 261340 2020-07-13 18:26:55.620978 152919.481545: ...
...
# perf script -F+tod --ns
perf 261340 2020-07-13 18:26:55.620971621 152919.481538922: ...
perf 261340 2020-07-13 18:26:55.620975985 152919.481543286: ...
perf 261340 2020-07-13 18:26:55.620978096 152919.481545397: ...
...
It's available only for recording with clockid specified,
because it's the only case where we can get reference time
to wallclock time. It's can't do that with perf clock yet.
Error is display if you want to use --tod on data without
clockid specified:
# perf script -F+tod
Can't provide 'tod' time, missing clock data. Please record with -k/--clockid option.
Original-patch-by: David Ahern <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-script.c | 131 +++++++++++++++++++++++++++---------
1 file changed, 98 insertions(+), 33 deletions(-)
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index eb45f678dc2f..484ce6067d23 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -114,6 +114,32 @@ enum perf_output_field {
PERF_OUTPUT_MISC = 1ULL << 29,
PERF_OUTPUT_SRCCODE = 1ULL << 30,
PERF_OUTPUT_IPC = 1ULL << 31,
+ PERF_OUTPUT_TOD = 1ULL << 32,
+};
+
+struct perf_script {
+ struct perf_tool tool;
+ struct perf_session *session;
+ bool show_task_events;
+ bool show_mmap_events;
+ bool show_switch_events;
+ bool show_namespace_events;
+ bool show_lost_events;
+ bool show_round_events;
+ bool show_bpf_events;
+ bool show_cgroup_events;
+ bool show_text_poke_events;
+ bool allocated;
+ bool per_event_dump;
+ bool stitch_lbr;
+ struct evswitch evswitch;
+ struct perf_cpu_map *cpus;
+ struct perf_thread_map *threads;
+ int name_width;
+ const char *time_str;
+ struct perf_time_interval *ptime_range;
+ int range_size;
+ int range_num;
};
struct output_option {
@@ -152,6 +178,7 @@ struct output_option {
{.str = "misc", .field = PERF_OUTPUT_MISC},
{.str = "srccode", .field = PERF_OUTPUT_SRCCODE},
{.str = "ipc", .field = PERF_OUTPUT_IPC},
+ {.str = "tod", .field = PERF_OUTPUT_TOD},
};
enum {
@@ -502,6 +529,7 @@ static void set_print_ip_opts(struct perf_event_attr *attr)
*/
static int perf_session__check_output_opt(struct perf_session *session)
{
+ bool tod = false;
unsigned int j;
struct evsel *evsel;
@@ -528,6 +556,7 @@ static int perf_session__check_output_opt(struct perf_session *session)
continue;
set_print_ip_opts(&evsel->core.attr);
+ tod |= output[j].fields & PERF_OUTPUT_TOD;
}
if (!no_callchain) {
@@ -568,13 +597,17 @@ static int perf_session__check_output_opt(struct perf_session *session)
}
}
+ if (tod && !session->header.env.clock.enabled) {
+ pr_err("Can't provide 'tod' time, missing clock data. "
+ "Please record with -k/--clockid option.\n");
+ return -1;
+ }
out:
return 0;
}
static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask,
- FILE *fp
-)
+ FILE *fp)
{
unsigned i = 0, r;
int printed = 0;
@@ -592,6 +625,56 @@ static int perf_sample__fprintf_regs(struct regs_dump *regs, uint64_t mask,
return printed;
}
+#define DEFAULT_TOD_FMT "%F %H:%M:%S"
+
+static char*
+tod_scnprintf(struct perf_script *script, char *buf, int buflen,
+ u64 timestamp)
+{
+ u64 tod_ns, clockid_ns;
+ struct perf_env *env;
+ unsigned long nsec;
+ struct tm ltime;
+ char date[64];
+ time_t sec;
+
+ buf[0] = '\0';
+ if (buflen < 64 || !script)
+ return buf;
+
+ env = &script->session->header.env;
+ if (!env->clock.enabled) {
+ scnprintf(buf, buflen, "disabled");
+ return buf;
+ }
+
+ clockid_ns = env->clock.clockid_ns;
+ tod_ns = env->clock.tod_ns;
+
+ if (timestamp > clockid_ns)
+ tod_ns += timestamp - clockid_ns;
+ else
+ tod_ns -= clockid_ns - timestamp;
+
+ sec = (time_t) (tod_ns / NSEC_PER_SEC);
+ nsec = tod_ns - sec * NSEC_PER_SEC;
+
+ if (localtime_r(&sec, <ime) == NULL) {
+ scnprintf(buf, buflen, "failed");
+ } else {
+ strftime(date, sizeof(date), DEFAULT_TOD_FMT, <ime);
+
+ if (symbol_conf.nanosecs) {
+ snprintf(buf, buflen, "%s.%09lu", date, nsec);
+ } else {
+ snprintf(buf, buflen, "%s.%06lu",
+ date, nsec / NSEC_PER_USEC);
+ }
+ }
+
+ return buf;
+}
+
static int perf_sample__fprintf_iregs(struct perf_sample *sample,
struct perf_event_attr *attr, FILE *fp)
{
@@ -606,7 +689,8 @@ static int perf_sample__fprintf_uregs(struct perf_sample *sample,
attr->sample_regs_user, fp);
}
-static int perf_sample__fprintf_start(struct perf_sample *sample,
+static int perf_sample__fprintf_start(struct perf_script *script,
+ struct perf_sample *sample,
struct thread *thread,
struct evsel *evsel,
u32 type, FILE *fp)
@@ -615,6 +699,7 @@ static int perf_sample__fprintf_start(struct perf_sample *sample,
unsigned long secs;
unsigned long long nsecs;
int printed = 0;
+ char tstr[128];
if (PRINT_FIELD(COMM)) {
if (latency_format)
@@ -683,6 +768,11 @@ static int perf_sample__fprintf_start(struct perf_sample *sample,
printed += ret;
}
+ if (PRINT_FIELD(TOD)) {
+ tod_scnprintf(script, tstr, sizeof(tstr), sample->time);
+ printed += fprintf(fp, "%s ", tstr);
+ }
+
if (PRINT_FIELD(TIME)) {
u64 t = sample->time;
if (reltime) {
@@ -1667,31 +1757,6 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample,
return 0;
}
-struct perf_script {
- struct perf_tool tool;
- struct perf_session *session;
- bool show_task_events;
- bool show_mmap_events;
- bool show_switch_events;
- bool show_namespace_events;
- bool show_lost_events;
- bool show_round_events;
- bool show_bpf_events;
- bool show_cgroup_events;
- bool show_text_poke_events;
- bool allocated;
- bool per_event_dump;
- bool stitch_lbr;
- struct evswitch evswitch;
- struct perf_cpu_map *cpus;
- struct perf_thread_map *threads;
- int name_width;
- const char *time_str;
- struct perf_time_interval *ptime_range;
- int range_size;
- int range_num;
-};
-
static int evlist__max_name_len(struct evlist *evlist)
{
struct evsel *evsel;
@@ -1739,7 +1804,7 @@ static void script_print_metric(struct perf_stat_config *config __maybe_unused,
if (!fmt)
return;
- perf_sample__fprintf_start(mctx->sample, mctx->thread, mctx->evsel,
+ perf_sample__fprintf_start(NULL, mctx->sample, mctx->thread, mctx->evsel,
PERF_RECORD_SAMPLE, mctx->fp);
fputs("\tmetric: ", mctx->fp);
if (color)
@@ -1754,7 +1819,7 @@ static void script_new_line(struct perf_stat_config *config __maybe_unused,
{
struct metric_ctx *mctx = ctx;
- perf_sample__fprintf_start(mctx->sample, mctx->thread, mctx->evsel,
+ perf_sample__fprintf_start(NULL, mctx->sample, mctx->thread, mctx->evsel,
PERF_RECORD_SAMPLE, mctx->fp);
fputs("\tmetric: ", mctx->fp);
}
@@ -1865,7 +1930,7 @@ static void process_event(struct perf_script *script,
++es->samples;
- perf_sample__fprintf_start(sample, thread, evsel,
+ perf_sample__fprintf_start(script, sample, thread, evsel,
PERF_RECORD_SAMPLE, fp);
if (PRINT_FIELD(PERIOD))
@@ -2174,7 +2239,7 @@ static int print_event_with_time(struct perf_tool *tool,
thread = machine__findnew_thread(machine, pid, tid);
if (thread && evsel) {
- perf_sample__fprintf_start(sample, thread, evsel,
+ perf_sample__fprintf_start(script, sample, thread, evsel,
event->header.type, stdout);
}
@@ -3439,7 +3504,7 @@ int cmd_script(int argc, const char **argv)
"Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,"
"addr,symoff,srcline,period,iregs,uregs,brstack,"
"brstacksym,flags,bpf-output,brstackinsn,brstackoff,"
- "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc",
+ "callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc,tod",
parse_output_fields),
OPT_BOOLEAN('a', "all-cpus", &system_wide,
"system-wide collection from all CPUs"),
--
2.25.4
On Thu, Jul 30, 2020 at 11:39:44PM +0200, Jiri Olsa wrote:
> The patchset is adding the ability to display TOD/wallclock timestamp
> in 'perf script' output and in 'perf data convert --to-ctf' subcommand,
> so the converted CTF data contain TOD/wallclock timestamps.
But why? Wallclock is a horrible piece of crap. Why would you want to do
this?
On 7/30/20 4:14 PM, [email protected] wrote:
> On Thu, Jul 30, 2020 at 11:39:44PM +0200, Jiri Olsa wrote:
>
>> The patchset is adding the ability to display TOD/wallclock timestamp
>> in 'perf script' output and in 'perf data convert --to-ctf' subcommand,
>> so the converted CTF data contain TOD/wallclock timestamps.
>
> But why? Wallclock is a horrible piece of crap. Why would you want to do
> this?
>
Same reason I brought this up 9+ years ago: userspace lives on
time-of-day, and troubleshooting is based on correlating timestamps from
multiple sources. To correlate a perf event to syslog or an application
log, we need time-of-day.
On Thu, Jul 30, 2020 at 07:21:54PM -0600, David Ahern wrote:
> On 7/30/20 4:14 PM, [email protected] wrote:
> > On Thu, Jul 30, 2020 at 11:39:44PM +0200, Jiri Olsa wrote:
> >
> >> The patchset is adding the ability to display TOD/wallclock timestamp
> >> in 'perf script' output and in 'perf data convert --to-ctf' subcommand,
> >> so the converted CTF data contain TOD/wallclock timestamps.
> >
> > But why? Wallclock is a horrible piece of crap. Why would you want to do
> > this?
> >
>
> Same reason I brought this up 9+ years ago: userspace lives on
> time-of-day, and troubleshooting is based on correlating timestamps from
> multiple sources. To correlate a perf event to syslog or an application
> log, we need time-of-day.
yep, we have a customer that needs to compare data from multiple servers
jirka
> yep, we have a customer that needs to compare data from multiple servers
It's also needed to correlate over different guests on the same machine.
This is an important use case.
It would be nice if we could find a way to export the per guest
TSC offset from KVM, then we could use the more reliable TSC for this,
at least as long as no TSC scaling is used.
But failing that wall clock is probably the best you can do.
-Andi
On Thu, Jul 30, 2020 at 07:21:54PM -0600, David Ahern wrote:
> On 7/30/20 4:14 PM, [email protected] wrote:
> > On Thu, Jul 30, 2020 at 11:39:44PM +0200, Jiri Olsa wrote:
> >
> >> The patchset is adding the ability to display TOD/wallclock timestamp
> >> in 'perf script' output and in 'perf data convert --to-ctf' subcommand,
> >> so the converted CTF data contain TOD/wallclock timestamps.
> >
> > But why? Wallclock is a horrible piece of crap. Why would you want to do
> > this?
> >
>
> Same reason I brought this up 9+ years ago: userspace lives on
> time-of-day, and troubleshooting is based on correlating timestamps from
> multiple sources. To correlate a perf event to syslog or an application
> log, we need time-of-day.
You need a sync'ed CLOCK_MONOTONIC for that, CLOCK_REALTIME 'aka'
wallclock is a trainwreck, you don't ever want that.
On Fri, Jul 31, 2020 at 08:36:12AM -0700, Andi Kleen wrote:
> > yep, we have a customer that needs to compare data from multiple servers
>
> It's also needed to correlate over different guests on the same machine.
> This is an important use case.
Both these cases you want to sync up CLOCK_MONOTONIC, using walltime is
just utterly misguided.
What happens if the servers have (per accident or otherwise) different
DST settings, or someone does a clock_setttime() for giggles.
All you really want is a clock that runs at the same rate but is not
subject to random jumps and user foibles.
On 7/31/20 12:05 PM, [email protected] wrote:
> On Fri, Jul 31, 2020 at 08:36:12AM -0700, Andi Kleen wrote:
>>> yep, we have a customer that needs to compare data from multiple servers
>>
>> It's also needed to correlate over different guests on the same machine.
>> This is an important use case.
>
> Both these cases you want to sync up CLOCK_MONOTONIC, using walltime is
> just utterly misguided.
Every userspace component logs in walltime. You can say that is
misguided, but that is the way it is. The missing piece is the ability
to correlate kernel events to userspace logs.
>
> What happens if the servers have (per accident or otherwise) different
> DST settings, or someone does a clock_setttime() for giggles.
Yes, someone *could* change the time. Someone *could* start ntpd or
other time server in the middle of a session. While technically such
things can happen, that is not real life in most environments (e.g.,
Data center servers). ntpd (or other) is started at boot, and it is just
the little misc adjustments that happen over time.
We could add tracepoints and detect the changes and invalidate the
reference time. We could add tracepoints to track the adjustments and
update the reference time. In my experience over 9+ years using this
tool (out of tree patches) that has never been the problem.
>
> All you really want is a clock that runs at the same rate but is not
> subject to random jumps and user foibles.
>
All I want is to compare user logs to a kernel event via timestamps.
Em Fri, Jul 31, 2020 at 06:46:21PM -0600, David Ahern escreveu:
> On 7/31/20 12:05 PM, [email protected] wrote:
> > On Fri, Jul 31, 2020 at 08:36:12AM -0700, Andi Kleen wrote:
> >>> yep, we have a customer that needs to compare data from multiple servers
> >> It's also needed to correlate over different guests on the same machine.
> >> This is an important use case.
> > Both these cases you want to sync up CLOCK_MONOTONIC, using walltime is
> > just utterly misguided.
> Every userspace component logs in walltime. You can say that is
> misguided, but that is the way it is. The missing piece is the ability
> to correlate kernel events to userspace logs.
> > What happens if the servers have (per accident or otherwise) different
> > DST settings, or someone does a clock_setttime() for giggles.
> Yes, someone *could* change the time. Someone *could* start ntpd or
> other time server in the middle of a session. While technically such
> things can happen, that is not real life in most environments (e.g.,
> Data center servers). ntpd (or other) is started at boot, and it is just
> the little misc adjustments that happen over time.
> We could add tracepoints and detect the changes and invalidate the
> reference time. We could add tracepoints to track the adjustments and
> update the reference time. In my experience over 9+ years using this
> tool (out of tree patches) that has never been the problem.
> > All you really want is a clock that runs at the same rate but is not
> > subject to random jumps and user foibles.
> All I want is to compare user logs to a kernel event via timestamps.
Can we have both possibilities and leave the decision on which one to
use in the hands of those who have a gun to shoot wherever they want,
maybe in the foot?
- Arnaldo