Hi,
Here is a sereis of patches for enabling "event cache" feature
to perf probe. Brendan gives me this cool idea, thanks! :)
In this series, I added following options/features;
- --output option
We can save the probe definition command for given probe-event
instead of setting up the local tracing/kprobe_events.
- --no-inlines option
We can avoid searching the inline functions in debuginfo. Usually
useful with wildcards since the wildcards will hit a huge amount
of probe-points.
- $params special probe argument
$params is expanded to function parameters only, no locally defined
variables. This is useful for function-call tracing.
- wildcard support for function name
Wildcard support is the key feature for this idea. Now we can use
'*foo*' for function name to define the probe-point.
So by using all of them, we can make an "event cache" file on all
functions (except for inlined functions) as below.
# perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
builds "event.cache" file in which event settings for
all function entries, like below;
p:probe/reset_early_page_tables _text+12980741
p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
p:probe/exit_amd_microcode _text+14692680
p:probe/early_make_pgtable _text+12981274 address=%di:u64
p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
p:probe/reserve_ebda_region _text+12982117
This event.cache file will be big (but much smaller than native
debuginfo :) ) if your kernel have many option embedded.
Anyway, you can compress it too.
# wc -l event.cache
33813 event.cache
# ls -sh event.cache
2.3M event.cache
# ls -sh event.cache.gz
464K event.cache.gz
For setting up a probe event, you can grep the function name
and write it to tracing/kprobe_events, as below;
# zcat event.cache.gz | \
grep probe/vfs_symlink > /sys/kernel/debug/tracing/kprobe_events
This can be applied for the remote machine only if the machine
runs on completely same kernel binary. Perhaps, we need some
helper tool to check it.
Thank you,
---
Masami Hiramatsu (6):
[BUGFIX] perf-probe: Fix to handle optimized not-inlined but has no instance
[DOC] perf-probe: Update perf-probe document
perf-probe: Add --output option to write commands in a standard file
perf-probe: Add --no-inlines option to avoid searching inline functions
perf-probe: Support $params special probe argument
perf-probe: Support glob wildcards for function name
tools/perf/Documentation/perf-probe.txt | 25 ++++++++++
tools/perf/builtin-probe.c | 32 +++++++++++++
tools/perf/util/dwarf-aux.c | 31 +++++++++++++
tools/perf/util/dwarf-aux.h | 6 +++
tools/perf/util/probe-event.c | 73 +++++++++++++++++++++++--------
tools/perf/util/probe-event.h | 4 +-
tools/perf/util/probe-finder.c | 74 +++++++++++++++++++------------
tools/perf/util/probe-finder.h | 6 ++-
tools/perf/util/util.h | 4 ++
9 files changed, 202 insertions(+), 53 deletions(-)
--
Masami HIRAMATSU
Software Platform Research Dpt. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Fix to handle optimized no-inline functions which have
only function definition but no actual instance at
that point. To fix this problem, we need to find actual
instance of the function.
Without this patch:
----
# perf probe -a __up
Failed to get entry address of __up.
Error: Failed to add events.
# perf probe -L __up
Specified source line is not found.
Error: Failed to show lines.
----
With this patch:
----
# perf probe -a __up
Added new event:
probe:__up (on __up)
You can now use it in all perf tools, such as:
perf record -e probe:__up -aR sleep 1
# perf probe -L __up
<__up@/home/fedora/ksrc/linux-3/kernel/locking/semaphore.c:0>
0 static noinline void __sched __up(struct semaphore *sem)
{
struct semaphore_waiter *waiter = list_first_entry(&sem->wait_
struct semaphore_waite
4 list_del(&waiter->list);
5 waiter->up = true;
6 wake_up_process(waiter->task);
7 }
----
Signed-off-by: Masami Hiramatsu <[email protected]>
---
tools/perf/util/dwarf-aux.c | 15 +++++++++++++++
tools/perf/util/dwarf-aux.h | 3 +++
tools/perf/util/probe-finder.c | 12 ++++--------
3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c
index cc66c40..780b2bc 100644
--- a/tools/perf/util/dwarf-aux.c
+++ b/tools/perf/util/dwarf-aux.c
@@ -278,6 +278,21 @@ bool die_is_func_def(Dwarf_Die *dw_die)
}
/**
+ * die_is_func_instance - Ensure that this DIE is an instance of a subprogram
+ * @dw_die: a DIE
+ *
+ * Ensure that this DIE is an instance (which has an entry address).
+ * This returns true if @dw_die is a function instance. If not, you need to
+ * call die_walk_instances() to find actual instances.
+ **/
+bool die_is_func_instance(Dwarf_Die *dw_die)
+{
+ Dwarf_Addr tmp;
+
+ /* Actually gcc optimizes non-inline as like as inlined */
+ return !dwarf_func_inline(dw_die) && dwarf_entrypc(dw_die, &tmp) == 0;
+}
+/**
* die_get_data_member_location - Get the data-member offset
* @mb_die: a DIE of a member of a data structure
* @offs: The offset of the member in the data structure
diff --git a/tools/perf/util/dwarf-aux.h b/tools/perf/util/dwarf-aux.h
index b4fe90c..af7dbcd 100644
--- a/tools/perf/util/dwarf-aux.h
+++ b/tools/perf/util/dwarf-aux.h
@@ -41,6 +41,9 @@ extern int cu_walk_functions_at(Dwarf_Die *cu_die, Dwarf_Addr addr,
/* Ensure that this DIE is a subprogram and definition (not declaration) */
extern bool die_is_func_def(Dwarf_Die *dw_die);
+/* Ensure that this DIE is an instance of a subprogram */
+extern bool die_is_func_instance(Dwarf_Die *dw_die);
+
/* Compare diename and tname */
extern bool die_compare_name(Dwarf_Die *dw_die, const char *tname);
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index c7918f8..1914dfd 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -915,17 +915,13 @@ static int probe_point_search_cb(Dwarf_Die *sp_die, void *data)
dwarf_decl_line(sp_die, &pf->lno);
pf->lno += pp->line;
param->retval = find_probe_point_by_line(pf);
- } else if (!dwarf_func_inline(sp_die)) {
+ } else if (die_is_func_instance(sp_die)) {
+ /* Instances always have the entry address */
+ dwarf_entrypc(sp_die, &pf->addr);
/* Real function */
if (pp->lazy_line)
param->retval = find_probe_point_lazy(sp_die, pf);
else {
- if (dwarf_entrypc(sp_die, &pf->addr) != 0) {
- pr_warning("Failed to get entry address of "
- "%s.\n", dwarf_diename(sp_die));
- param->retval = -ENOENT;
- return DWARF_CB_ABORT;
- }
pf->addr += pp->offset;
/* TODO: Check the address in this function */
param->retval = call_probe_finder(sp_die, pf);
@@ -1520,7 +1516,7 @@ static int line_range_search_cb(Dwarf_Die *sp_die, void *data)
pr_debug("New line range: %d to %d\n", lf->lno_s, lf->lno_e);
lr->start = lf->lno_s;
lr->end = lf->lno_e;
- if (dwarf_func_inline(sp_die))
+ if (!die_is_func_instance(sp_die))
param->retval = die_walk_instances(sp_die,
line_range_inline_cb, lf);
else
Update Documentation/perf-probe.txt to add descriptions
of some newer options.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
tools/perf/Documentation/perf-probe.txt | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt
index aaa869b..239609c 100644
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -47,6 +47,12 @@ OPTIONS
-v::
--verbose::
Be more verbose (show parsed arguments, etc).
+ Can not use with -q.
+
+-q::
+--quiet::
+ Be quiet (do not show any messages including errors).
+ Can not use with -v.
-a::
--add=::
@@ -96,7 +102,7 @@ OPTIONS
Dry run. With this option, --add and --del doesn't execute actual
adding and removal operations.
---max-probes::
+--max-probes=NUM::
Set the maximum number of probe points for an event. Default is 128.
-x::
@@ -104,8 +110,13 @@ OPTIONS
Specify path to the executable or shared library file for user
space tracing. Can also be used with --funcs option.
+--demangle::
+ Demangle application symbols. --no-demangle is also available
+ for disabling demangling.
+
--demangle-kernel::
- Demangle kernel symbols.
+ Demangle kernel symbols. --no-demangle-kernel is also available
+ for disabling kernel demangling.
In absence of -m/-x options, perf probe checks if the first argument after
the options is an absolute path name. If its an absolute path, perf probe
@@ -137,6 +148,7 @@ Each probe argument follows below syntax.
[NAME=]LOCALVAR|$retval|%REG|@SYMBOL[:TYPE]
'NAME' specifies the name of this argument (optional). You can use the name of local variable, local data structure member (e.g. var->field, var.field2), local array with fixed index (e.g. array[1], var->array[0], var->pointer[2]), or kprobe-tracer argument format (e.g. $retval, %ax, etc). Note that the name of this argument will be set as the last member name if you specify a local data structure member (e.g. field2 for 'var->field1.field2'.)
+'$vars' special argument is also available for NAME, it is expanded to the local variables which can access at given probe point.
'TYPE' casts the type of this argument (optional). If omitted, perf probe automatically set the type based on debuginfo. You can specify 'string' type only for the local variable or structure member which is an array of or a pointer to 'char' or 'unsigned char' type.
On x86 systems %REG is always the short form of the register: for example %AX. %RAX or %EAX is not valid.
Add --output (-o) option to write probe-definition commands
in a standard file (or the stdout if '-' is given) instead
of tracing/*probe_events.
This allows user to add events in a remote machine by
sending the saved command file to the machine.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
tools/perf/Documentation/perf-probe.txt | 5 ++++
tools/perf/builtin-probe.c | 26 ++++++++++++++++++++-
tools/perf/util/probe-event.c | 39 +++++++++++++++++++++++++------
tools/perf/util/probe-event.h | 2 +-
4 files changed, 63 insertions(+), 9 deletions(-)
diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt
index 239609c..6f183ab 100644
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -118,6 +118,11 @@ OPTIONS
Demangle kernel symbols. --no-demangle-kernel is also available
for disabling kernel demangling.
+-o::
+--output=PATH
+ Output the probe-definition commands to given file. If '-' is
+ passed as PATH, output to stdout.
+
In absence of -m/-x options, perf probe checks if the first argument after
the options is an absolute path name. If its an absolute path, perf probe
uses it as a target module/target user space binary to probe.
diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 921bb69..c2fa3a3 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -61,6 +61,7 @@ static struct {
struct strlist *dellist;
struct line_range line_range;
char *target;
+ char *output;
int max_probe_points;
struct strfilter *filter;
} params;
@@ -207,6 +208,26 @@ static int opt_set_target(const struct option *opt, const char *str,
return ret;
}
+static int opt_set_output(const struct option *opt __maybe_unused,
+ const char *str, int unset __maybe_unused)
+{
+ char *tmp;
+
+ if (!str)
+ return 0;
+
+ if (params.output)
+ return -EINVAL;
+
+ pr_debug("Set output to %s\n", str);
+ tmp = strdup(str);
+ if (!tmp)
+ return -ENOMEM;
+ params.output = tmp;
+ return 0;
+}
+
+
#ifdef HAVE_DWARF_SUPPORT
static int opt_show_lines(const struct option *opt __maybe_unused,
const char *str, int unset __maybe_unused)
@@ -284,6 +305,7 @@ static void cleanup_params(void)
strlist__delete(params.dellist);
line_range__clear(¶ms.line_range);
free(params.target);
+ free(params.output);
if (params.filter)
strfilter__delete(params.filter);
memset(¶ms, 0, sizeof(params));
@@ -377,6 +399,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
opt_set_filter),
OPT_CALLBACK('x', "exec", NULL, "executable|path",
"target executable name or path", opt_set_target),
+ OPT_CALLBACK('o', "output", NULL, "file",
+ "output the results to given file", opt_set_output),
OPT_BOOLEAN(0, "demangle", &symbol_conf.demangle,
"Enable symbol demangling"),
OPT_BOOLEAN(0, "demangle-kernel", &symbol_conf.demangle_kernel,
@@ -487,7 +511,7 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
if (params.nevents) {
ret = add_perf_probe_events(params.events, params.nevents,
params.max_probe_points,
- params.target,
+ params.target, params.output,
params.force_add);
if (ret < 0) {
pr_err_with_code(" Error: Failed to add events.", ret);
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 28eb141..2aab9c9 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1615,6 +1615,10 @@ char *synthesize_probe_trace_command(struct probe_trace_event *tev)
goto error;
len += ret;
}
+ /* Add the last LF */
+ ret = e_snprintf(buf + len, MAX_CMDLEN - len, "\n");
+ if (ret <= 0)
+ goto error;
return buf;
error:
@@ -2047,7 +2051,7 @@ static int write_probe_trace_event(int fd, struct probe_trace_event *tev)
return -EINVAL;
}
- pr_debug("Writing event: %s\n", buf);
+ pr_debug("Writing event: %s", buf); /* Note: buf has a LF already. */
if (!probe_event_dry_run) {
ret = write(fd, buf, strlen(buf));
if (ret <= 0)
@@ -2097,8 +2101,9 @@ static int get_new_event_name(char *buf, size_t len, const char *base,
}
static int __add_probe_trace_events(struct perf_probe_event *pev,
- struct probe_trace_event *tevs,
- int ntevs, bool allow_suffix)
+ struct probe_trace_event *tevs,
+ int ntevs, int outfd,
+ bool allow_suffix)
{
int i, fd, ret;
struct probe_trace_event *tev = NULL;
@@ -2106,7 +2111,9 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
const char *event, *group;
struct strlist *namelist;
- if (pev->uprobes)
+ if (outfd >= 0)
+ fd = outfd;
+ else if (pev->uprobes)
fd = open_uprobe_events(true);
else
fd = open_kprobe_events(true);
@@ -2117,7 +2124,11 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
}
/* Get current event names */
- namelist = get_probe_trace_event_names(fd, false);
+ if (fd == STDOUT_FILENO)
+ namelist = strlist__new(true, NULL);
+ else
+ namelist = get_probe_trace_event_names(fd, false);
+
if (!namelist) {
pr_debug("Failed to get current event list.\n");
return -EIO;
@@ -2367,10 +2378,24 @@ struct __event_package {
};
int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
- int max_tevs, const char *target, bool force_add)
+ int max_tevs, const char *target, const char *output,
+ bool force_add)
{
int i, j, ret;
struct __event_package *pkgs;
+ int outfd = -1;
+
+ if (output) {
+ if (strcmp(output, "-") != 0) {
+ outfd = open(output, O_RDWR | O_CREAT | O_TRUNC, 0600);
+ if (outfd < 0) {
+ ret = -errno;
+ pr_err("Failed to open %s\n", output);
+ return ret;
+ }
+ } else
+ outfd = STDOUT_FILENO;
+ }
ret = 0;
pkgs = zalloc(sizeof(struct __event_package) * npevs);
@@ -2400,7 +2425,7 @@ int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
/* Loop 2: add all events */
for (i = 0; i < npevs; i++) {
ret = __add_probe_trace_events(pkgs[i].pev, pkgs[i].tevs,
- pkgs[i].ntevs, force_add);
+ pkgs[i].ntevs, outfd, force_add);
if (ret < 0)
break;
}
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index e01e994..df91d0a 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -125,7 +125,7 @@ extern const char *kernel_get_module_path(const char *module);
extern int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
int max_probe_points, const char *module,
- bool force_add);
+ const char *output, bool force_add);
extern int del_perf_probe_events(struct strlist *dellist);
extern int show_perf_probe_events(void);
extern int show_line_range(struct line_range *lr, const char *module,
Add --no-inlines(--inlines) option to avoid searching inline
functions.
Searching all functions which matches glob pattern can take a
long time and find a lot of inline functions. With this option
perf-probe searches target on the non-inlined functions.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
tools/perf/Documentation/perf-probe.txt | 4 ++++
tools/perf/builtin-probe.c | 6 +++++-
tools/perf/util/probe-event.c | 21 +++++++++++++--------
tools/perf/util/probe-event.h | 3 ++-
tools/perf/util/probe-finder.c | 8 +++++---
tools/perf/util/probe-finder.h | 3 ++-
6 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt
index 6f183ab..835f685 100644
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -81,6 +81,10 @@ OPTIONS
(Only for --vars) Show external defined variables in addition to local
variables.
+--no-inlines::
+ (Only for --add) Search only for non-inlined functions. The functions
+ which do not have instances are ignored.
+
-F::
--funcs::
Show available functions in given module or kernel. With -x/--exec,
diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index c2fa3a3..a910cbc 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -55,6 +55,7 @@ static struct {
bool show_funcs;
bool mod_events;
bool uprobes;
+ bool no_inlines;
bool quiet;
int nevents;
struct perf_probe_event events[MAX_PROBES];
@@ -386,6 +387,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
OPT_CALLBACK('m', "module", NULL, "modname|path",
"target module name (for online) or path (for offline)",
opt_set_target),
+ OPT_BOOLEAN('\0', "no-inlines", ¶ms.no_inlines,
+ "Don't search inlined functions"),
#endif
OPT__DRY_RUN(&probe_event_dry_run),
OPT_INTEGER('\0', "max-probes", ¶ms.max_probe_points,
@@ -512,7 +515,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
ret = add_perf_probe_events(params.events, params.nevents,
params.max_probe_points,
params.target, params.output,
- params.force_add);
+ params.force_add,
+ params.no_inlines);
if (ret < 0) {
pr_err_with_code(" Error: Failed to add events.", ret);
return ret;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 2aab9c9..0062114 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -462,7 +462,8 @@ static int post_process_probe_trace_events(struct probe_trace_event *tevs,
/* Try to find perf_probe_event with debuginfo */
static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
struct probe_trace_event **tevs,
- int max_tevs, const char *target)
+ int max_tevs, const char *target,
+ bool no_inline)
{
bool need_dwarf = perf_probe_event_need_dwarf(pev);
struct debuginfo *dinfo;
@@ -479,7 +480,8 @@ static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
pr_debug("Try to find probe point from debuginfo.\n");
/* Searching trace events corresponding to a probe event */
- ntevs = debuginfo__find_trace_events(dinfo, pev, tevs, max_tevs);
+ ntevs = debuginfo__find_trace_events(dinfo, pev, tevs, max_tevs,
+ no_inline);
debuginfo__delete(dinfo);
@@ -812,7 +814,8 @@ find_perf_probe_point_from_dwarf(struct probe_trace_point *tp __maybe_unused,
static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
struct probe_trace_event **tevs __maybe_unused,
int max_tevs __maybe_unused,
- const char *target __maybe_unused)
+ const char *target __maybe_unused,
+ bool no_inline __maybe_unused)
{
if (perf_probe_event_need_dwarf(pev)) {
pr_warning("Debuginfo-analysis is not supported.\n");
@@ -2349,8 +2352,9 @@ err_out:
}
static int convert_to_probe_trace_events(struct perf_probe_event *pev,
- struct probe_trace_event **tevs,
- int max_tevs, const char *target)
+ struct probe_trace_event **tevs,
+ int max_tevs, const char *target,
+ bool no_inline)
{
int ret;
@@ -2364,7 +2368,8 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
}
/* Convert perf_probe_event with debuginfo */
- ret = try_to_find_probe_trace_events(pev, tevs, max_tevs, target);
+ ret = try_to_find_probe_trace_events(pev, tevs, max_tevs, target,
+ no_inline);
if (ret != 0)
return ret; /* Found in debuginfo or got an error */
@@ -2379,7 +2384,7 @@ struct __event_package {
int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
int max_tevs, const char *target, const char *output,
- bool force_add)
+ bool force_add, bool no_inline)
{
int i, j, ret;
struct __event_package *pkgs;
@@ -2416,7 +2421,7 @@ int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
ret = convert_to_probe_trace_events(pkgs[i].pev,
&pkgs[i].tevs,
max_tevs,
- target);
+ target, no_inline);
if (ret < 0)
goto end;
pkgs[i].ntevs = ret;
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index df91d0a..93d0b10 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -125,7 +125,8 @@ extern const char *kernel_get_module_path(const char *module);
extern int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
int max_probe_points, const char *module,
- const char *output, bool force_add);
+ const char *output, bool force_add,
+ bool no_inline);
extern int del_perf_probe_events(struct strlist *dellist);
extern int show_perf_probe_events(void);
extern int show_line_range(struct line_range *lr, const char *module,
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 1914dfd..0c6f6f2 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -926,7 +926,7 @@ static int probe_point_search_cb(Dwarf_Die *sp_die, void *data)
/* TODO: Check the address in this function */
param->retval = call_probe_finder(sp_die, pf);
}
- } else
+ } else if (!pf->no_inline)
/* Inlined function: search instances */
param->retval = die_walk_instances(sp_die,
probe_point_inline_cb, (void *)pf);
@@ -1178,10 +1178,12 @@ end:
/* Find probe_trace_events specified by perf_probe_event from debuginfo */
int debuginfo__find_trace_events(struct debuginfo *dbg,
struct perf_probe_event *pev,
- struct probe_trace_event **tevs, int max_tevs)
+ struct probe_trace_event **tevs,
+ int max_tevs, bool no_inline)
{
struct trace_event_finder tf = {
- .pf = {.pev = pev, .callback = add_probe_trace_event},
+ .pf = {.pev = pev, .callback = add_probe_trace_event,
+ .no_inline = no_inline},
.mod = dbg->mod, .max_tevs = max_tevs};
int ret;
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index 92590b2..2b77339 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -38,7 +38,7 @@ extern void debuginfo__delete(struct debuginfo *dbg);
extern int debuginfo__find_trace_events(struct debuginfo *dbg,
struct perf_probe_event *pev,
struct probe_trace_event **tevs,
- int max_tevs);
+ int max_tevs, bool no_inline);
/* Find a perf_probe_point from debuginfo */
extern int debuginfo__find_probe_point(struct debuginfo *dbg,
@@ -76,6 +76,7 @@ struct probe_finder {
Dwarf_Op *fb_ops; /* Frame base attribute */
struct perf_probe_arg *pvar; /* Current target variable */
struct probe_trace_arg *tvar; /* Current result variable */
+ bool no_inline; /* Do not find inlines */
};
struct trace_event_finder {
$params is similar to $vars but matches only function parameters
not local variables.
Thus, this is useful for tracing function parameter changing
or tracing function call with parameters.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
tools/perf/Documentation/perf-probe.txt | 2 +-
tools/perf/util/probe-finder.c | 29 ++++++++++++++++-------------
tools/perf/util/probe-finder.h | 3 +++
3 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt
index 835f685..fde40640 100644
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -157,7 +157,7 @@ Each probe argument follows below syntax.
[NAME=]LOCALVAR|$retval|%REG|@SYMBOL[:TYPE]
'NAME' specifies the name of this argument (optional). You can use the name of local variable, local data structure member (e.g. var->field, var.field2), local array with fixed index (e.g. array[1], var->array[0], var->pointer[2]), or kprobe-tracer argument format (e.g. $retval, %ax, etc). Note that the name of this argument will be set as the last member name if you specify a local data structure member (e.g. field2 for 'var->field1.field2'.)
-'$vars' special argument is also available for NAME, it is expanded to the local variables which can access at given probe point.
+'$vars' and '$params' special arguments are also available for NAME, '$vars' is expanded to the local variables (including function parameters) which can access at given probe point. '$params' is expanded to only the function parameters.
'TYPE' casts the type of this argument (optional). If omitted, perf probe automatically set the type based on debuginfo. You can specify 'string' type only for the local variable or structure member which is an array of or a pointer to 'char' or 'unsigned char' type.
On x86 systems %REG is always the short form of the register: for example %AX. %RAX or %EAX is not valid.
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 0c6f6f2..d1fcab7 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -1054,6 +1054,7 @@ found:
struct local_vars_finder {
struct probe_finder *pf;
struct perf_probe_arg *args;
+ bool vars;
int max_args;
int nargs;
int ret;
@@ -1068,7 +1069,7 @@ static int copy_variables_cb(Dwarf_Die *die_mem, void *data)
tag = dwarf_tag(die_mem);
if (tag == DW_TAG_formal_parameter ||
- tag == DW_TAG_variable) {
+ (tag == DW_TAG_variable && vf->vars)) {
if (convert_variable_location(die_mem, vf->pf->addr,
vf->pf->fb_ops, &pf->sp_die,
NULL) == 0) {
@@ -1094,26 +1095,28 @@ static int expand_probe_args(Dwarf_Die *sc_die, struct probe_finder *pf,
Dwarf_Die die_mem;
int i;
int n = 0;
- struct local_vars_finder vf = {.pf = pf, .args = args,
+ struct local_vars_finder vf = {.pf = pf, .args = args, .vars = false,
.max_args = MAX_PROBE_ARGS, .ret = 0};
for (i = 0; i < pf->pev->nargs; i++) {
/* var never be NULL */
- if (strcmp(pf->pev->args[i].var, "$vars") == 0) {
- pr_debug("Expanding $vars into:");
- vf.nargs = n;
- /* Special local variables */
- die_find_child(sc_die, copy_variables_cb, (void *)&vf,
- &die_mem);
- pr_debug(" (%d)\n", vf.nargs - n);
- if (vf.ret < 0)
- return vf.ret;
- n = vf.nargs;
- } else {
+ if (strcmp(pf->pev->args[i].var, PROBE_ARG_VARS) == 0)
+ vf.vars = true;
+ else if (strcmp(pf->pev->args[i].var, PROBE_ARG_PARAMS) != 0) {
/* Copy normal argument */
args[n] = pf->pev->args[i];
n++;
+ continue;
}
+ pr_debug("Expanding %s into:", pf->pev->args[i].var);
+ vf.nargs = n;
+ /* Special local variables */
+ die_find_child(sc_die, copy_variables_cb, (void *)&vf,
+ &die_mem);
+ pr_debug(" (%d)\n", vf.nargs - n);
+ if (vf.ret < 0)
+ return vf.ret;
+ n = vf.nargs;
}
return n;
}
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index 2b77339..7a340d3 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -10,6 +10,9 @@
#define MAX_PROBES 128
#define MAX_PROBE_ARGS 128
+#define PROBE_ARG_VARS "$vars"
+#define PROBE_ARG_PARAMS "$params"
+
static inline int is_c_varname(const char *name)
{
/* TODO */
Support glob wildcards for function name when adding
new probes. This will allow us to build caches of
function-entry level information with $params.
e.g.
# perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
builds "event.cache" file in which event settings for
all function entries, like below;
p:probe/reset_early_page_tables _text+12980741
p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
p:probe/exit_amd_microcode _text+14692680
p:probe/early_make_pgtable _text+12981274 address=%di:u64
p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
p:probe/reserve_ebda_region _text+12982117
This event.cache file will be big if your kernel have many
option embedded. Anyway, you can compress it too.
# wc -l event.cache
33813 event.cache
# ls -sh event.cache
2.3M event.cache
# ls -sh event.cache.gz
464K event.cache.gz
For setting up a probe event, you can grep the function name
and write it to tracing/kprobe_events, as below;
# zcat event.cache.gz | \
grep probe/vfs_symlink > /sys/kernel/debug/tracing/kprobe_events
This can be applied for the remote machine only if the machine
runs on completely same kernel binary.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
tools/perf/util/dwarf-aux.c | 16 ++++++++++++++++
tools/perf/util/dwarf-aux.h | 3 +++
tools/perf/util/probe-event.c | 15 +++++++++++----
tools/perf/util/probe-event.h | 1 +
tools/perf/util/probe-finder.c | 27 +++++++++++++++++++++------
tools/perf/util/util.h | 4 ++++
6 files changed, 56 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c
index 780b2bc..8db75cf 100644
--- a/tools/perf/util/dwarf-aux.c
+++ b/tools/perf/util/dwarf-aux.c
@@ -139,11 +139,27 @@ int cu_walk_functions_at(Dwarf_Die *cu_die, Dwarf_Addr addr,
bool die_compare_name(Dwarf_Die *dw_die, const char *tname)
{
const char *name;
+
name = dwarf_diename(dw_die);
return name ? (strcmp(tname, name) == 0) : false;
}
/**
+ * die_match_name - Match diename and glob
+ * @dw_die: a DIE
+ * @glob: a string of target glob pattern
+ *
+ * Glob matching the name of @dw_die and @glob. Return false if matching fail.
+ */
+bool die_match_name(Dwarf_Die *dw_die, const char *glob)
+{
+ const char *name;
+
+ name = dwarf_diename(dw_die);
+ return name ? strglobmatch(name, glob) : false;
+}
+
+/**
* die_get_call_lineno - Get callsite line number of inline-function instance
* @in_die: a DIE of an inlined function instance
*
diff --git a/tools/perf/util/dwarf-aux.h b/tools/perf/util/dwarf-aux.h
index af7dbcd..50a3cdc 100644
--- a/tools/perf/util/dwarf-aux.h
+++ b/tools/perf/util/dwarf-aux.h
@@ -47,6 +47,9 @@ extern bool die_is_func_instance(Dwarf_Die *dw_die);
/* Compare diename and tname */
extern bool die_compare_name(Dwarf_Die *dw_die, const char *tname);
+/* Matching diename with glob pattern */
+extern bool die_match_name(Dwarf_Die *dw_die, const char *glob);
+
/* Get callsite line number of inline-function instance */
extern int die_get_call_lineno(Dwarf_Die *in_die);
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 0062114..7a8ce15 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -450,7 +450,11 @@ static int post_process_probe_trace_events(struct probe_trace_event *tevs,
tmp = strdup(reloc_sym->name);
if (!tmp)
return -ENOMEM;
- free(tevs[i].point.symbol);
+ /* If we have no realname, use symbol for it */
+ if (!tevs[i].point.realname)
+ tevs[i].point.realname = tevs[i].point.symbol;
+ else
+ free(tevs[i].point.symbol);
tevs[i].point.symbol = tmp;
tevs[i].point.offset = tevs[i].point.address -
reloc_sym->unrelocated_addr;
@@ -1778,6 +1782,7 @@ static void clear_probe_trace_event(struct probe_trace_event *tev)
free(tev->event);
free(tev->group);
free(tev->point.symbol);
+ free(tev->point.realname);
free(tev->point.module);
for (i = 0; i < tev->nargs; i++) {
free(tev->args[i].name);
@@ -2113,6 +2118,7 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
char buf[64];
const char *event, *group;
struct strlist *namelist;
+ bool safename;
if (outfd >= 0)
fd = outfd;
@@ -2137,6 +2143,7 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
return -EIO;
}
+ safename = (pev->point.function && !strisglob(pev->point.function));
ret = 0;
pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
for (i = 0; i < ntevs; i++) {
@@ -2144,10 +2151,10 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
if (pev->event)
event = pev->event;
else
- if (pev->point.function)
+ if (safename)
event = pev->point.function;
else
- event = tev->point.symbol;
+ event = tev->point.realname;
if (pev->group)
group = pev->group;
else
@@ -2210,7 +2217,7 @@ static int probe_function_filter(struct map *map __maybe_unused,
struct symbol *sym)
{
if ((sym->binding == STB_GLOBAL || sym->binding == STB_LOCAL) &&
- strcmp(looking_function_name, sym->name) == 0) {
+ strglobmatch(sym->name, looking_function_name)) {
num_matched_functions++;
return 0;
}
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 93d0b10..5b45e2a 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -10,6 +10,7 @@ extern bool probe_event_dry_run;
/* kprobe-tracer and uprobe-tracer tracing point */
struct probe_trace_point {
+ char *realname; /* function real name (if needed) */
char *symbol; /* Base symbol */
char *module; /* Module name */
unsigned long offset; /* Offset from symbol */
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index d1fcab7..dfaf88a 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -711,7 +711,7 @@ static int find_best_scope_cb(Dwarf_Die *fn_die, void *data)
}
/* If the function name is given, that's what user expects */
if (fsp->function) {
- if (die_compare_name(fn_die, fsp->function)) {
+ if (die_match_name(fn_die, fsp->function)) {
memcpy(fsp->die_mem, fn_die, sizeof(Dwarf_Die));
fsp->found = true;
return 1;
@@ -903,13 +903,14 @@ static int probe_point_search_cb(Dwarf_Die *sp_die, void *data)
/* Check tag and diename */
if (!die_is_func_def(sp_die) ||
- !die_compare_name(sp_die, pp->function))
+ !die_match_name(sp_die, pp->function))
return DWARF_CB_OK;
/* Check declared file */
if (pp->file && strtailcmp(pp->file, dwarf_decl_file(sp_die)))
return DWARF_CB_OK;
+ pr_debug("Matched function: %s\n", dwarf_diename(sp_die));
pf->fname = dwarf_decl_file(sp_die);
if (pp->line) { /* Function relative line */
dwarf_decl_line(sp_die, &pf->lno);
@@ -926,10 +927,20 @@ static int probe_point_search_cb(Dwarf_Die *sp_die, void *data)
/* TODO: Check the address in this function */
param->retval = call_probe_finder(sp_die, pf);
}
- } else if (!pf->no_inline)
+ } else if (!pf->no_inline) {
/* Inlined function: search instances */
param->retval = die_walk_instances(sp_die,
probe_point_inline_cb, (void *)pf);
+ /* This could be a non-existed inline definition */
+ if (param->retval == -ENOENT && strisglob(pp->function))
+ param->retval = 0;
+ }
+
+ /* We need to find other candidates */
+ if (strisglob(pp->function) && param->retval >= 0) {
+ param->retval = 0; /* We have to clear the result */
+ return DWARF_CB_OK;
+ }
return DWARF_CB_ABORT; /* Exit; no same symbol in this CU. */
}
@@ -958,7 +969,7 @@ static int pubname_search_cb(Dwarf *dbg, Dwarf_Global *gl, void *data)
if (dwarf_tag(param->sp_die) != DW_TAG_subprogram)
return DWARF_CB_OK;
- if (die_compare_name(param->sp_die, param->function)) {
+ if (die_match_name(param->sp_die, param->function)) {
if (!dwarf_offdie(dbg, gl->cu_offset, param->cu_die))
return DWARF_CB_OK;
@@ -995,7 +1006,7 @@ static int debuginfo__find_probes(struct debuginfo *dbg,
return -ENOMEM;
/* Fastpath: lookup by function name from .debug_pubnames section */
- if (pp->function) {
+ if (pp->function && !strisglob(pp->function)) {
struct pubname_callback_param pubname_param = {
.function = pp->function,
.file = pp->file,
@@ -1144,6 +1155,10 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, struct probe_finder *pf)
if (ret < 0)
return ret;
+ tev->point.realname = strdup(dwarf_diename(sc_die));
+ if (!tev->point.realname)
+ return -ENOMEM;
+
pr_debug("Probe point found: %s+%lu\n", tev->point.symbol,
tev->point.offset);
@@ -1508,7 +1523,7 @@ static int line_range_search_cb(Dwarf_Die *sp_die, void *data)
return DWARF_CB_OK;
if (die_is_func_def(sp_die) &&
- die_compare_name(sp_die, lr->function)) {
+ die_match_name(sp_die, lr->function)) {
lf->fname = dwarf_decl_file(sp_die);
dwarf_decl_line(sp_die, &lr->offset);
pr_debug("fname: %s, lineno:%d\n", lf->fname, lr->offset);
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 80bfdaa..6f5683d 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -257,6 +257,10 @@ char **argv_split(const char *str, int *argcp);
void argv_free(char **argv);
bool strglobmatch(const char *str, const char *pat);
bool strlazymatch(const char *str, const char *pat);
+static inline bool strisglob(const char *str)
+{
+ return strpbrk(str, "*?[") != NULL;
+}
int strtailcmp(const char *s1, const char *s2);
char *strxfrchar(char *s, char from, char to);
unsigned long convert_unit(unsigned long value, char *unit);
Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
> Hi,
>
> Here is a sereis of patches for enabling "event cache" feature
> to perf probe. Brendan gives me this cool idea, thanks! :)
>
> In this series, I added following options/features;
> - --output option
> We can save the probe definition command for given probe-event
> instead of setting up the local tracing/kprobe_events.
>
> - --no-inlines option
> We can avoid searching the inline functions in debuginfo. Usually
> useful with wildcards since the wildcards will hit a huge amount
> of probe-points.
>
> - $params special probe argument
> $params is expanded to function parameters only, no locally defined
> variables. This is useful for function-call tracing.
>
> - wildcard support for function name
> Wildcard support is the key feature for this idea. Now we can use
> '*foo*' for function name to define the probe-point.
>
> So by using all of them, we can make an "event cache" file on all
> functions (except for inlined functions) as below.
>
> # perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
>
> builds "event.cache" file in which event settings for
> all function entries, like below;
>
> p:probe/reset_early_page_tables _text+12980741
> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
> p:probe/exit_amd_microcode _text+14692680
> p:probe/early_make_pgtable _text+12981274 address=%di:u64
> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
> p:probe/reserve_ebda_region _text+12982117
>
> This event.cache file will be big (but much smaller than native
> debuginfo :) ) if your kernel have many option embedded.
> Anyway, you can compress it too.
How do you validate that the cache can be used against some kernel? I.e.
is this that the user has to do? Isn't this prone to errors?
Perhaps you could pick the build-id and store it into the event cache
file, in the first lines, somethings like:
[acme@zoo ~]$ printf "buildid: %s\n" $(perf buildid-list --kernel)
buildid: a4cacca49391fc4f42ac8f58990f4e97042efae8
[acme@zoo ~]$ printf "buildid: %s\n" $(perf buildid-list --kernel)
buildid: a4cacca49391fc4f42ac8f58990f4e97042efae8
Maybe this would be nice to have integrated with 'perf archive' somehow
and then store this into ~/.debug/[probe]/<BUILDID>/dso-name
where dso-name would be [kernel] for the kernel and the full path for
userspace stuff, and then when adding a new probe we would look there
for a pre-built/cached event definition, only looking for the debuginfo
(which is done using the build-id already, right) and would insert the
probe definitions there, etc.
Then, later, one would use 'perf archive' passing some keys (or a
perf.data file, like done nowadays to pick the files in ~/.debug for
dsos that had hits on the specified perf.data file) to get the cached
values to use on some other machine, to avoid having to use the
debuginfo files there.
I.e. in summary I think that the format is ok, but we need to have this
inside the ~/.debug hierarchy so that we can make sure that we use the
right probe definition, one that matches the DSOs being used (the kernel
or some other userspace binary).
Great stuff, keep it up!
- Arnaldo
> # wc -l event.cache
> 33813 event.cache
> # ls -sh event.cache
> 2.3M event.cache
> # ls -sh event.cache.gz
> 464K event.cache.gz
>
> For setting up a probe event, you can grep the function name
> and write it to tracing/kprobe_events, as below;
>
> # zcat event.cache.gz | \
> grep probe/vfs_symlink > /sys/kernel/debug/tracing/kprobe_events
>
> This can be applied for the remote machine only if the machine
> runs on completely same kernel binary. Perhaps, we need some
> helper tool to check it.
>
> Thank you,
>
>
> ---
>
> Masami Hiramatsu (6):
> [BUGFIX] perf-probe: Fix to handle optimized not-inlined but has no instance
> [DOC] perf-probe: Update perf-probe document
> perf-probe: Add --output option to write commands in a standard file
> perf-probe: Add --no-inlines option to avoid searching inline functions
> perf-probe: Support $params special probe argument
> perf-probe: Support glob wildcards for function name
>
>
> tools/perf/Documentation/perf-probe.txt | 25 ++++++++++
> tools/perf/builtin-probe.c | 32 +++++++++++++
> tools/perf/util/dwarf-aux.c | 31 +++++++++++++
> tools/perf/util/dwarf-aux.h | 6 +++
> tools/perf/util/probe-event.c | 73 +++++++++++++++++++++++--------
> tools/perf/util/probe-event.h | 4 +-
> tools/perf/util/probe-finder.c | 74 +++++++++++++++++++------------
> tools/perf/util/probe-finder.h | 6 ++-
> tools/perf/util/util.h | 4 ++
> 9 files changed, 202 insertions(+), 53 deletions(-)
>
> --
> Masami HIRAMATSU
> Software Platform Research Dpt. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: [email protected]
(2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
>> Hi,
>>
>> Here is a sereis of patches for enabling "event cache" feature
>> to perf probe. Brendan gives me this cool idea, thanks! :)
>>
>> In this series, I added following options/features;
>> - --output option
>> We can save the probe definition command for given probe-event
>> instead of setting up the local tracing/kprobe_events.
>>
>> - --no-inlines option
>> We can avoid searching the inline functions in debuginfo. Usually
>> useful with wildcards since the wildcards will hit a huge amount
>> of probe-points.
>>
>> - $params special probe argument
>> $params is expanded to function parameters only, no locally defined
>> variables. This is useful for function-call tracing.
>>
>> - wildcard support for function name
>> Wildcard support is the key feature for this idea. Now we can use
>> '*foo*' for function name to define the probe-point.
>>
>> So by using all of them, we can make an "event cache" file on all
>> functions (except for inlined functions) as below.
>>
>> # perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
>>
>> builds "event.cache" file in which event settings for
>> all function entries, like below;
>>
>> p:probe/reset_early_page_tables _text+12980741
>> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
>> p:probe/exit_amd_microcode _text+14692680
>> p:probe/early_make_pgtable _text+12981274 address=%di:u64
>> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
>> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
>> p:probe/reserve_ebda_region _text+12982117
>>
>> This event.cache file will be big (but much smaller than native
>> debuginfo :) ) if your kernel have many option embedded.
>> Anyway, you can compress it too.
>
> How do you validate that the cache can be used against some kernel? I.e.
> is this that the user has to do? Isn't this prone to errors?
Actually, kprobe event itself can reject command if the given address
is not in the kernel text nor instruction boundary (perhaps, uprobes
may have a problem...), so for the kernel level, it is safe.
>
> Perhaps you could pick the build-id and store it into the event cache
> file, in the first lines, somethings like:
Agreed, build-id should be the best way to check that.
For kprobes, user can easy to get and compare it with local one as below :)
----
RLOGIN=root@$REMOTE
rid=`ssh $RLOGIN "od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '"`
lid=`od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '`
if [ $rid != $lid ]; then
echo "Error: Build-id mis-matched!"
exit 1;
fi
echo "Setting up $EVENTNAME at $REMOTE"
zcat event.cache.gz | grep $EVENTNAME |\
ssh $RLOGIN "tee -a /sys/kernel/debug/tracing/kprobe_events"
echo "Done"
----
With this script, you don't need to install perf at remote hosts.
(This is what enterprise people called "agent-less")
> [acme@zoo ~]$ printf "buildid: %s\n" $(perf buildid-list --kernel)
> buildid: a4cacca49391fc4f42ac8f58990f4e97042efae8
>
> [acme@zoo ~]$ printf "buildid: %s\n" $(perf buildid-list --kernel)
> buildid: a4cacca49391fc4f42ac8f58990f4e97042efae8
>
> Maybe this would be nice to have integrated with 'perf archive' somehow
> and then store this into ~/.debug/[probe]/<BUILDID>/dso-name
>
> where dso-name would be [kernel] for the kernel and the full path for
> userspace stuff, and then when adding a new probe we would look there
> for a pre-built/cached event definition, only looking for the debuginfo
> (which is done using the build-id already, right) and would insert the
> probe definitions there, etc.
This will be good for SDT too. Perhaps, both of SDT and cached probes
should share the same file.
> Then, later, one would use 'perf archive' passing some keys (or a
> perf.data file, like done nowadays to pick the files in ~/.debug for
> dsos that had hits on the specified perf.data file) to get the cached
> values to use on some other machine, to avoid having to use the
> debuginfo files there.
Yeah, querying it from the BUILDID database by using a pair of remote
build-id and the binary path is a good feature.
>
> I.e. in summary I think that the format is ok, but we need to have this
> inside the ~/.debug hierarchy so that we can make sure that we use the
> right probe definition, one that matches the DSOs being used (the kernel
> or some other userspace binary).
OK, perhaps, that is also good to SDT series at last.
>
> Great stuff, keep it up!
Thanks!
>
> - Arnaldo
>
>> # wc -l event.cache
>> 33813 event.cache
>> # ls -sh event.cache
>> 2.3M event.cache
>> # ls -sh event.cache.gz
>> 464K event.cache.gz
>>
>> For setting up a probe event, you can grep the function name
>> and write it to tracing/kprobe_events, as below;
>>
>> # zcat event.cache.gz | \
>> grep probe/vfs_symlink > /sys/kernel/debug/tracing/kprobe_events
>>
>> This can be applied for the remote machine only if the machine
>> runs on completely same kernel binary. Perhaps, we need some
>> helper tool to check it.
>>
>> Thank you,
>>
>>
>> ---
>>
>> Masami Hiramatsu (6):
>> [BUGFIX] perf-probe: Fix to handle optimized not-inlined but has no instance
>> [DOC] perf-probe: Update perf-probe document
>> perf-probe: Add --output option to write commands in a standard file
>> perf-probe: Add --no-inlines option to avoid searching inline functions
>> perf-probe: Support $params special probe argument
>> perf-probe: Support glob wildcards for function name
>>
>>
>> tools/perf/Documentation/perf-probe.txt | 25 ++++++++++
>> tools/perf/builtin-probe.c | 32 +++++++++++++
>> tools/perf/util/dwarf-aux.c | 31 +++++++++++++
>> tools/perf/util/dwarf-aux.h | 6 +++
>> tools/perf/util/probe-event.c | 73 +++++++++++++++++++++++--------
>> tools/perf/util/probe-event.h | 4 +-
>> tools/perf/util/probe-finder.c | 74 +++++++++++++++++++------------
>> tools/perf/util/probe-finder.h | 6 ++-
>> tools/perf/util/util.h | 4 ++
>> 9 files changed, 202 insertions(+), 53 deletions(-)
>>
>> --
>> Masami HIRAMATSU
>> Software Platform Research Dpt. Linux Technology Center
>> Hitachi, Ltd., Yokohama Research Laboratory
>> E-mail: [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Em Mon, Nov 03, 2014 at 09:11:18PM +0900, Masami Hiramatsu escreveu:
> (2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
> >> p:probe/reset_early_page_tables _text+12980741
> >> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
> >> p:probe/exit_amd_microcode _text+14692680
> >> p:probe/early_make_pgtable _text+12981274 address=%di:u64
> >> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
> >> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
> >> p:probe/reserve_ebda_region _text+12982117
> >> This event.cache file will be big (but much smaller than native
> >> debuginfo :) ) if your kernel have many option embedded.
> >> Anyway, you can compress it too.
> > How do you validate that the cache can be used against some kernel? I.e.
> > is this that the user has to do? Isn't this prone to errors?
> Actually, kprobe event itself can reject command if the given address
> is not in the kernel text nor instruction boundary (perhaps, uprobes
> may have a problem...), so for the kernel level, it is safe.
No, it is not necessarily safe.
What if you specify function foo() that has address 0x1234 for kernel
v3.16 and then run the cached probe on kernel v3.18 and on that kernel
the function foo() maps to address 0x2345 and function bar() instead
maps to address 0x1234? Oops.
The build-id was designed to uniquely identify a DSO, we need to use it.
I think that at some point not using it should be left to a, in
systemtap parlance, "guru" mode, with tooling warning profusely when
build ids are not available and requiring even more forcing when it
doesn't matches.
> > Perhaps you could pick the build-id and store it into the event cache
> > file, in the first lines, somethings like:
> Agreed, build-id should be the best way to check that.
> For kprobes, user can easy to get and compare it with local one as below :)
> ----
> RLOGIN=root@$REMOTE
> rid=`ssh $RLOGIN "od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '"`
> lid=`od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '`
> if [ $rid != $lid ]; then
> echo "Error: Build-id mis-matched!"
> exit 1;
> fi
> echo "Setting up $EVENTNAME at $REMOTE"
> zcat event.cache.gz | grep $EVENTNAME |\
> ssh $RLOGIN "tee -a /sys/kernel/debug/tracing/kprobe_events"
> echo "Done"
> ----
> With this script, you don't need to install perf at remote hosts.
> (This is what enterprise people called "agent-less")
This is only sufficient (and a cool feature!) if you will immediataly use the
cached info (i.e. using just two systems: one development machine, with all
debugging info, devel tools, etc, and the other the machine to observe, that is
bare bones, no debugging info, etc)), but the moment you store that
event.cache.gz (that has no build id embedded from what I can see from the
above example) then you lose the build id for those cached events.
You need to tightly associate whatever symbol resolution is done with
the build id, at symbol resolution/caching time.
Then, before using the cached symbol resolution, we need to check if the target
kernel/DSO build id is the same as the cached symbol kernel/DSO build id.
> > [acme@zoo ~]$ printf "buildid: %s\n" $(perf buildid-list --kernel)
> > buildid: a4cacca49391fc4f42ac8f58990f4e97042efae8
> > [acme@zoo ~]$ printf "buildid: %s\n" $(perf buildid-list --kernel)
> > buildid: a4cacca49391fc4f42ac8f58990f4e97042efae8
> > Maybe this would be nice to have integrated with 'perf archive' somehow
> > and then store this into ~/.debug/[probe]/<BUILDID>/dso-name
> > where dso-name would be [kernel] for the kernel and the full path for
> > userspace stuff, and then when adding a new probe we would look there
> > for a pre-built/cached event definition, only looking for the debuginfo
> > (which is done using the build-id already, right) and would insert the
> > probe definitions there, etc.
> This will be good for SDT too. Perhaps, both of SDT and cached probes
> should share the same file.
Right, what is in ~/.debug/ may be used by multiple tools, just like
debuginfo files are, by keying the content by its build id.
And also by having separate subdirectory trees for different kinds of
symbol information, i.e. the ~/.debug/.build-id/ links may point to
either ELF files or to kallsyms data. What we are discussing here is to
make it also point to the [ku]probes_tracer cached probes files.
The way that DSO files are cached may by the same that you end up adding
the [ku]probes_tracer cached files, take a look at the example of
caching for the '/usr/bin/gcc' DSO on a test maachine here at my home
lab:
[root@zoo ~]# ls -la ~/.debug/usr/bin/gcc/
total 2268
drwxr-xr-x. 2 root root 4096 Oct 15 16:54 .
drwxr-xr-x. 53 root root 4096 Oct 21 18:06 ..
-rwxr-xr-x. 1 root root 768576 Jun 24 14:08 07f4c7f58a6e7ce9177d45f71d9698e906168096
-rwxr-xr-x. 3 root root 772672 Sep 11 08:24 4f80f5b2caaa5bf4f7e46b593934399e2ce56702
-rwxr-xr-x. 1 root root 768560 Dec 12 2013 f2466d61bedca2ae3d4ebbaded2bc4e8e5fe95a8
[root@zoo ~]
For each DSO we have a directory where each different binary that ever lived in
this machine with the name '/usr/bin/gcc' will have one file, the name being each
build id:
[root@zoo ~]# ~/.debug/usr/bin/gcc/07f4c7f58a6e7ce9177d45f71d9698e906168096 --version | head -1
07f4c7f58a6e7ce9177d45f71d9698e906168096 (GCC) 4.8.3 20140624 (Red Hat 4.8.3-1)
[root@zoo ~]# ~/.debug/usr/bin/gcc/4f80f5b2caaa5bf4f7e46b593934399e2ce56702 --version | head -1
4f80f5b2caaa5bf4f7e46b593934399e2ce56702 (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
[root@zoo ~]# ~/.debug/usr/bin/gcc/f2466d61bedca2ae3d4ebbaded2bc4e8e5fe95a8 --version | head -1
f2466d61bedca2ae3d4ebbaded2bc4e8e5fe95a8 (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7)
[root@zoo ~]# [root@zoo ~]
Then, on the ~/.debug/.build-id/ hierarchy, sorted by build-id we have:
[root@zoo ~]# ls -la ~/.debug/.build-id/07/f4c7f58a6e7ce9177d45f71d9698e906168096
lrwxrwxrwx. 1 root root 58 Jul 24 16:48 /root/.debug/.build-id/07/f4c7f58a6e7ce9177d45f71d9698e906168096 -> ../../usr/bin/gcc/07f4c7f58a6e7ce9177d45f71d9698e906168096
[root@zoo ~]# ls -la ~/.debug/.build-id/4f/80f5b2caaa5bf4f7e46b593934399e2ce56702
lrwxrwxrwx. 1 root root 58 Oct 15 16:54 /root/.debug/.build-id/4f/80f5b2caaa5bf4f7e46b593934399e2ce56702 -> ../../usr/bin/gcc/4f80f5b2caaa5bf4f7e46b593934399e2ce56702
[root@zoo ~]# ls -la ~/.debug/.build-id/f2/466d61bedca2ae3d4ebbaded2bc4e8e5fe95a8
lrwxrwxrwx. 1 root root 58 Jun 2 12:29 /root/.debug/.build-id/f2/466d61bedca2ae3d4ebbaded2bc4e8e5fe95a8 -> ../../usr/bin/gcc/f2466d61bedca2ae3d4ebbaded2bc4e8e5fe95a8
[root@zoo ~]#
This solves the problem with debuginfo packages where we can't have multiple
debuginfo packages installed, as well as for files that didn't came from
debuginfo files.
[root@zoo ~]# perf buildid-cache --hell
Error: unknown option `hell'
usage: perf buildid-cache [<options>]
-a, --add <file list>
file(s) to add
-k, --kcore <file> kcore file to add
-r, --remove <file list>
file(s) to remove
-M, --missing <file> to find missing build ids in the cache
-f, --force don't complain, do it
-u, --update <file list>
file(s) to update
-v, --verbose be more verbose
[root@zoo ~]#
Already has support for yet another of content: kcore files, its just a matter of adding
one more:
perf buildid-cache --probe
:-)
> > Then, later, one would use 'perf archive' passing some keys (or a
> > perf.data file, like done nowadays to pick the files in ~/.debug for
> > dsos that had hits on the specified perf.data file) to get the cached
> > values to use on some other machine, to avoid having to use the
> > debuginfo files there.
> Yeah, querying it from the BUILDID database by using a pair of remote
> build-id and the binary path is a good feature.
> > I.e. in summary I think that the format is ok, but we need to have this
> > inside the ~/.debug hierarchy so that we can make sure that we use the
> > right probe definition, one that matches the DSOs being used (the kernel
> > or some other userspace binary).
> OK, perhaps, that is also good to SDT series at last.
Sure thing!
- Arnaldo
Hi Masami,
On Fri, 31 Oct 2014 14:51:29 -0400, Masami Hiramatsu wrote:
> Hi,
>
> Here is a sereis of patches for enabling "event cache" feature
> to perf probe. Brendan gives me this cool idea, thanks! :)
>
> In this series, I added following options/features;
> - --output option
> We can save the probe definition command for given probe-event
> instead of setting up the local tracing/kprobe_events.
>
> - --no-inlines option
> We can avoid searching the inline functions in debuginfo. Usually
> useful with wildcards since the wildcards will hit a huge amount
> of probe-points.
>
> - $params special probe argument
> $params is expanded to function parameters only, no locally defined
> variables. This is useful for function-call tracing.
>
> - wildcard support for function name
> Wildcard support is the key feature for this idea. Now we can use
> '*foo*' for function name to define the probe-point.
>
> So by using all of them, we can make an "event cache" file on all
> functions (except for inlined functions) as below.
>
> # perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
>
> builds "event.cache" file in which event settings for
> all function entries, like below;
>
> p:probe/reset_early_page_tables _text+12980741
> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
> p:probe/exit_amd_microcode _text+14692680
> p:probe/early_make_pgtable _text+12981274 address=%di:u64
> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
> p:probe/reserve_ebda_region _text+12982117
Does this event cache support kernel modules too? AFAIK it can have a
different address whenever loaded even on a same kernel.
>
> This event.cache file will be big (but much smaller than native
> debuginfo :) ) if your kernel have many option embedded.
> Anyway, you can compress it too.
>
> # wc -l event.cache
> 33813 event.cache
> # ls -sh event.cache
> 2.3M event.cache
> # ls -sh event.cache.gz
> 464K event.cache.gz
>
> For setting up a probe event, you can grep the function name
> and write it to tracing/kprobe_events, as below;
>
> # zcat event.cache.gz | \
> grep probe/vfs_symlink > /sys/kernel/debug/tracing/kprobe_events
>
> This can be applied for the remote machine only if the machine
> runs on completely same kernel binary. Perhaps, we need some
> helper tool to check it.
While it's useful for "agent-less" systems, I think we also need to have
a simple way to apply it with perf tools.
Thanks,
Namhyung
>
> Thank you,
>
>
> ---
>
> Masami Hiramatsu (6):
> [BUGFIX] perf-probe: Fix to handle optimized not-inlined but has no instance
> [DOC] perf-probe: Update perf-probe document
> perf-probe: Add --output option to write commands in a standard file
> perf-probe: Add --no-inlines option to avoid searching inline functions
> perf-probe: Support $params special probe argument
> perf-probe: Support glob wildcards for function name
>
>
> tools/perf/Documentation/perf-probe.txt | 25 ++++++++++
> tools/perf/builtin-probe.c | 32 +++++++++++++
> tools/perf/util/dwarf-aux.c | 31 +++++++++++++
> tools/perf/util/dwarf-aux.h | 6 +++
> tools/perf/util/probe-event.c | 73 +++++++++++++++++++++++--------
> tools/perf/util/probe-event.h | 4 +-
> tools/perf/util/probe-finder.c | 74 +++++++++++++++++++------------
> tools/perf/util/probe-finder.h | 6 ++-
> tools/perf/util/util.h | 4 ++
> 9 files changed, 202 insertions(+), 53 deletions(-)
>
> --
> Masami HIRAMATSU
> Software Platform Research Dpt. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: [email protected]
(2014/11/04 1:19), Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 03, 2014 at 09:11:18PM +0900, Masami Hiramatsu escreveu:
>> (2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
>>> Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
>>>> p:probe/reset_early_page_tables _text+12980741
>>>> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
>>>> p:probe/exit_amd_microcode _text+14692680
>>>> p:probe/early_make_pgtable _text+12981274 address=%di:u64
>>>> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
>>>> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
>>>> p:probe/reserve_ebda_region _text+12982117
>
>>>> This event.cache file will be big (but much smaller than native
>>>> debuginfo :) ) if your kernel have many option embedded.
>>>> Anyway, you can compress it too.
>
>>> How do you validate that the cache can be used against some kernel? I.e.
>>> is this that the user has to do? Isn't this prone to errors?
>
>> Actually, kprobe event itself can reject command if the given address
>> is not in the kernel text nor instruction boundary (perhaps, uprobes
>> may have a problem...), so for the kernel level, it is safe.
>
> No, it is not necessarily safe.
>
> What if you specify function foo() that has address 0x1234 for kernel
> v3.16 and then run the cached probe on kernel v3.18 and on that kernel
> the function foo() maps to address 0x2345 and function bar() instead
> maps to address 0x1234? Oops.
In that case user just trace bar() instead of foo(). Of course it's
not correct, but shouldn't break the kernel (if the kernel is broken,
it is a bug of kprobes).
> The build-id was designed to uniquely identify a DSO, we need to use it.
>
> I think that at some point not using it should be left to a, in
> systemtap parlance, "guru" mode, with tooling warning profusely when
> build ids are not available and requiring even more forcing when it
> doesn't matches.
But it is not necessarily everyone uses perf probe to set up the probe
events(because it is a part of ftrace), as we can see in the Brendan's
scripts.
I think, at least what we need is clarifying how can they ensure
build-id before setting the probe events. I'd like to give them options
with knowledge instead of forcing by tools.
>>> Perhaps you could pick the build-id and store it into the event cache
>>> file, in the first lines, somethings like:
>
>> Agreed, build-id should be the best way to check that.
>
>> For kprobes, user can easy to get and compare it with local one as below :)
>> ----
>> RLOGIN=root@$REMOTE
>> rid=`ssh $RLOGIN "od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '"`
>> lid=`od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '`
>> if [ $rid != $lid ]; then
>> echo "Error: Build-id mis-matched!"
>> exit 1;
>> fi
>> echo "Setting up $EVENTNAME at $REMOTE"
>> zcat event.cache.gz | grep $EVENTNAME |\
>> ssh $RLOGIN "tee -a /sys/kernel/debug/tracing/kprobe_events"
>> echo "Done"
>> ----
>
>> With this script, you don't need to install perf at remote hosts.
>> (This is what enterprise people called "agent-less")
>
> This is only sufficient (and a cool feature!) if you will immediataly use the
> cached info (i.e. using just two systems: one development machine, with all
> debugging info, devel tools, etc, and the other the machine to observe, that is
> bare bones, no debugging info, etc)), but the moment you store that
> event.cache.gz (that has no build id embedded from what I can see from the
> above example) then you lose the build id for those cached events.
Right, in this case, it should be stored with build-id, like below :)
lid=`od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '`
perf probe -o - --max-probes=10000 --no-inlines -a '* $params' | \
gzip -c - > $lid.gz
Anyway, this is just a workaround by operation.
> You need to tightly associate whatever symbol resolution is done with
> the build id, at symbol resolution/caching time.
Agreed,
>
> Then, before using the cached symbol resolution, we need to check if the target
> kernel/DSO build id is the same as the cached symbol kernel/DSO build id.
Yeah, but again, some users may not like to install perf to the
target system (like embedded system etc.) and also, I, personally,
like to avoid introducing server-client networking feature
for perftools, since it may open another pandora-box of security...
I'd like to use it combining with other tools, like ssh or something
like that.
>
> Right, what is in ~/.debug/ may be used by multiple tools, just like
> debuginfo files are, by keying the content by its build id.
>
> And also by having separate subdirectory trees for different kinds of
> symbol information, i.e. the ~/.debug/.build-id/ links may point to
> either ELF files or to kallsyms data. What we are discussing here is to
> make it also point to the [ku]probes_tracer cached probes files.
>
> The way that DSO files are cached may by the same that you end up adding
> the [ku]probes_tracer cached files, take a look at the example of
> caching for the '/usr/bin/gcc' DSO on a test maachine here at my home
> lab:
Thanks, this is useful for me :)
[snip]
>
> This solves the problem with debuginfo packages where we can't have multiple
> debuginfo packages installed, as well as for files that didn't came from
> debuginfo files.
>
> [root@zoo ~]# perf buildid-cache --hell
> Error: unknown option `hell'
>
> usage: perf buildid-cache [<options>]
>
> -a, --add <file list>
> file(s) to add
> -k, --kcore <file> kcore file to add
> -r, --remove <file list>
> file(s) to remove
> -M, --missing <file> to find missing build ids in the cache
> -f, --force don't complain, do it
> -u, --update <file list>
> file(s) to update
> -v, --verbose be more verbose
>
> [root@zoo ~]#
>
> Already has support for yet another of content: kcore files, its just a matter of adding
> one more:
>
> perf buildid-cache --probe
>
> :-)
OK, I agree using .debug/.buildid/ to store caches.
Here is what I'm thinking,
# this makes caches for given pattern instead of adding probes.
perf probe --cache '* $params'
# the cache is stored in .debug/.buildid/<buildid>.probe
# the cache entry can be queried by buildid and eventname
perf probe --query ${remote_buildid}:do_fork
p:probe/do_fork _text+298722 clone_flags=%di:u64 stack_start=%si:u64 stack_size=%dx:u64 parent_tidptr=%cx:u64 child_tidptr=%r8:u64
# or perf can set it up directly to local
perf probe --query-add do_fork
Added new event:
probe:do_fork (on do_fork with clone_flags satck_start stack_size parent_tidptr child_tidptr)
You can now use it in all perf tools, such as:
perf record -e probe:do_fork -aR sleep 1
What would you think about this? :)
>
>>> Then, later, one would use 'perf archive' passing some keys (or a
>>> perf.data file, like done nowadays to pick the files in ~/.debug for
>>> dsos that had hits on the specified perf.data file) to get the cached
>>> values to use on some other machine, to avoid having to use the
>>> debuginfo files there.
>
>> Yeah, querying it from the BUILDID database by using a pair of remote
>> build-id and the binary path is a good feature.
>
>>> I.e. in summary I think that the format is ok, but we need to have this
>>> inside the ~/.debug hierarchy so that we can make sure that we use the
>>> right probe definition, one that matches the DSOs being used (the kernel
>>> or some other userspace binary).
>
>> OK, perhaps, that is also good to SDT series at last.
>
> Sure thing!
>
Thank you!
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Hi Arnaldo and Masami,
On Mon, 3 Nov 2014 14:19:00 -0200, Arnaldo Carvalho de Melo wrote:
> [root@zoo ~]# perf buildid-cache --hell
> Error: unknown option `hell'
>
> usage: perf buildid-cache [<options>]
>
> -a, --add <file list>
> file(s) to add
> -k, --kcore <file> kcore file to add
> -r, --remove <file list>
> file(s) to remove
> -M, --missing <file> to find missing build ids in the cache
> -f, --force don't complain, do it
> -u, --update <file list>
> file(s) to update
> -v, --verbose be more verbose
>
> [root@zoo ~]#
>
> Already has support for yet another of content: kcore files, its just a matter of adding
> one more:
>
> perf buildid-cache --probe
But it deals with binaries, and what we need is probe points. So maybe
it's worth add a new command like:
perf probe-cache [add|del|enable|disable] [<options>] [<path>|<build-id>] <spec>...
or perf probe-cache list [<options>]
This way we can handle both of kprobes and uprobes (so SDTs). What do
you think?
Thanks,
Namhyung
(2014/11/04 12:14), Namhyung Kim wrote:
> Hi Masami,
>
> On Fri, 31 Oct 2014 14:51:29 -0400, Masami Hiramatsu wrote:
>> Hi,
>>
>> Here is a sereis of patches for enabling "event cache" feature
>> to perf probe. Brendan gives me this cool idea, thanks! :)
>>
>> In this series, I added following options/features;
>> - --output option
>> We can save the probe definition command for given probe-event
>> instead of setting up the local tracing/kprobe_events.
>>
>> - --no-inlines option
>> We can avoid searching the inline functions in debuginfo. Usually
>> useful with wildcards since the wildcards will hit a huge amount
>> of probe-points.
>>
>> - $params special probe argument
>> $params is expanded to function parameters only, no locally defined
>> variables. This is useful for function-call tracing.
>>
>> - wildcard support for function name
>> Wildcard support is the key feature for this idea. Now we can use
>> '*foo*' for function name to define the probe-point.
>>
>> So by using all of them, we can make an "event cache" file on all
>> functions (except for inlined functions) as below.
>>
>> # perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
>>
>> builds "event.cache" file in which event settings for
>> all function entries, like below;
>>
>> p:probe/reset_early_page_tables _text+12980741
>> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
>> p:probe/exit_amd_microcode _text+14692680
>> p:probe/early_make_pgtable _text+12981274 address=%di:u64
>> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
>> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
>> p:probe/reserve_ebda_region _text+12982117
>
> Does this event cache support kernel modules too? AFAIK it can have a
> different address whenever loaded even on a same kernel.
Yes, for the modules perf probe uses target function symbol directly instead
of _text.
----
perf probe -m xfs -o - -q -a xfs_acl_exists
p:probe/xfs_acl_exists xfs:xfs_acl_exists+0
----
>> This event.cache file will be big (but much smaller than native
>> debuginfo :) ) if your kernel have many option embedded.
>> Anyway, you can compress it too.
>>
>> # wc -l event.cache
>> 33813 event.cache
>> # ls -sh event.cache
>> 2.3M event.cache
>> # ls -sh event.cache.gz
>> 464K event.cache.gz
>>
>> For setting up a probe event, you can grep the function name
>> and write it to tracing/kprobe_events, as below;
>>
>> # zcat event.cache.gz | \
>> grep probe/vfs_symlink > /sys/kernel/debug/tracing/kprobe_events
>>
>> This can be applied for the remote machine only if the machine
>> runs on completely same kernel binary. Perhaps, we need some
>> helper tool to check it.
>
> While it's useful for "agent-less" systems, I think we also need to have
> a simple way to apply it with perf tools.
Yeah, I see. As I've sent a reply to Arnaldo, I'd like to add --query for
cached event definition by checking a build id.
In that case, you just need to run perf-archive and send it to remote,
then you can run perf-probe --query-set "event" on the machine.
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Em Tue, Nov 04, 2014 at 01:36:31PM +0900, Masami Hiramatsu escreveu:
> (2014/11/04 1:19), Arnaldo Carvalho de Melo wrote:
> > Em Mon, Nov 03, 2014 at 09:11:18PM +0900, Masami Hiramatsu escreveu:
> >> (2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
> >>> Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
> >> Actually, kprobe event itself can reject command if the given address
> >> is not in the kernel text nor instruction boundary (perhaps, uprobes
> >> may have a problem...), so for the kernel level, it is safe.
> > No, it is not necessarily safe.
> > What if you specify function foo() that has address 0x1234 for kernel
> > v3.16 and then run the cached probe on kernel v3.18 and on that kernel
> > the function foo() maps to address 0x2345 and function bar() instead
> > maps to address 0x1234? Oops.
> In that case user just trace bar() instead of foo(). Of course it's
> not correct, but shouldn't break the kernel (if the kernel is broken,
> it is a bug of kprobes).
I.e. no crashes, just misleading information :-\
> > The build-id was designed to uniquely identify a DSO, we need to use it.
> > I think that at some point not using it should be left to a, in
> > systemtap parlance, "guru" mode, with tooling warning profusely when
> > build ids are not available and requiring even more forcing when it
> > doesn't matches.
> But it is not necessarily everyone uses perf probe to set up the probe
> events(because it is a part of ftrace), as we can see in the Brendan's
> scripts.
Right, If I implied that some particular tool should be used, sorry
about that, what I wanted to get accross was that the information that
allows users or tools to make sure there is no mismatch between the
cached probes and the target kernel is collected at cached probe
creating time and available at target use time.
> I think, at least what we need is clarifying how can they ensure
> build-id before setting the probe events. I'd like to give them options
> with knowledge instead of forcing by tools.
Right, so we need to have the build-id as part of the cache format,
perhaps as the first line, starting with a comment (#), that way the
user can use whatever way he has at its disposal to check that the
running kernel build-id is the same as the one on that comment. Using
that script you provided, that uses just things that are on the machine
(od, /sys/kernel/notes).
Tools would use that info as it would be in a well known format to do
that when used.
> >>> Perhaps you could pick the build-id and store it into the event cache
> >>> file, in the first lines, somethings like:
> >> Agreed, build-id should be the best way to check that.
> >> For kprobes, user can easy to get and compare it with local one as below :)
> >> ----
> >> RLOGIN=root@$REMOTE
> >> rid=`ssh $RLOGIN "od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '"`
> >> lid=`od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '`
> >> if [ $rid != $lid ]; then
> >> echo "Error: Build-id mis-matched!"
> >> exit 1;
> >> fi
> >> echo "Setting up $EVENTNAME at $REMOTE"
> >> zcat event.cache.gz | grep $EVENTNAME |\
> >> ssh $RLOGIN "tee -a /sys/kernel/debug/tracing/kprobe_events"
> >> echo "Done"
> >> ----
> >> With this script, you don't need to install perf at remote hosts.
> >> (This is what enterprise people called "agent-less")
> > This is only sufficient (and a cool feature!) if you will immediataly use the
> > cached info (i.e. using just two systems: one development machine, with all
> > debugging info, devel tools, etc, and the other the machine to observe, that is
> > bare bones, no debugging info, etc)), but the moment you store that
> > event.cache.gz (that has no build id embedded from what I can see from the
> > above example) then you lose the build id for those cached events.
> Right, in this case, it should be stored with build-id, like below :)
> lid=`od -j16 -w48 -An -t x1 /sys/kernel/notes | tr -d ' '`
> perf probe -o - --max-probes=10000 --no-inlines -a '* $params' | \
> gzip -c - > $lid.gz
> Anyway, this is just a workaround by operation.
> > You need to tightly associate whatever symbol resolution is done with
> > the build id, at symbol resolution/caching time.
> Agreed,
> > Then, before using the cached symbol resolution, we need to check if the target
> > kernel/DSO build id is the same as the cached symbol kernel/DSO build id.
> Yeah, but again, some users may not like to install perf to the
> target system (like embedded system etc.) and also, I, personally,
> like to avoid introducing server-client networking feature
> for perftools, since it may open another pandora-box of security...
> I'd like to use it combining with other tools, like ssh or something
> like that.
As I hope to have clarified about, no intention on adding the
requirement that any particular extra tool besides what is available on,
say, busybox or any other environment in widespread use.
> > Right, what is in ~/.debug/ may be used by multiple tools, just like
> > debuginfo files are, by keying the content by its build id.
> > And also by having separate subdirectory trees for different kinds of
> > symbol information, i.e. the ~/.debug/.build-id/ links may point to
> > either ELF files or to kallsyms data. What we are discussing here is to
> > make it also point to the [ku]probes_tracer cached probes files.
> > The way that DSO files are cached may by the same that you end up adding
> > the [ku]probes_tracer cached files, take a look at the example of
> > caching for the '/usr/bin/gcc' DSO on a test maachine here at my home
> > lab:
> Thanks, this is useful for me :)
> [snip]
> >
> > This solves the problem with debuginfo packages where we can't have multiple
> > debuginfo packages installed, as well as for files that didn't came from
> > debuginfo files.
> > [root@zoo ~]# perf buildid-cache --hell
> > Error: unknown option `hell'
> > usage: perf buildid-cache [<options>]
> > -a, --add <file list>
> > file(s) to add
> > -k, --kcore <file> kcore file to add
> > -r, --remove <file list>
> > file(s) to remove
> > -M, --missing <file> to find missing build ids in the cache
> > -f, --force don't complain, do it
> > -u, --update <file list>
> > file(s) to update
> > -v, --verbose be more verbose
> > [root@zoo ~]#
> > Already has support for yet another of content: kcore files, its just a matter of adding
> > one more:
> > perf buildid-cache --probe
> > :-)
>
> OK, I agree using .debug/.buildid/ to store caches.
> Here is what I'm thinking,
> # this makes caches for given pattern instead of adding probes.
> perf probe --cache '* $params'
> # the cache is stored in .debug/.buildid/<buildid>.probe
> # the cache entry can be queried by buildid and eventname
To follow the existing standard this would instead go to:
> # the cache is stored in .debug/probes/path/to/dso/name/buildid
> # And can be found via its buildid link .debug/.buildid/bu/ildid -> ../../probes/path/to/dso/name/buildid
> perf probe --query ${remote_buildid}:do_fork
> p:probe/do_fork _text+298722 clone_flags=%di:u64 stack_start=%si:u64 stack_size=%dx:u64 parent_tidptr=%cx:u64 child_tidptr=%r8:u64
> # or perf can set it up directly to local
> perf probe --query-add do_fork
You missed the build id above, no? I.e. it would be:
> # or perf can set it up directly to local
> perf probe --query-add ${remote_buildid}:do_fork
Right? and that would be the equivalent to:
perf probe --query ${remote_buildid}:do_fork > /sys/kernel/debug/tracing/kprobe_events
No?
And do you intend to generate that script above that checks the build
id, tees into /sys/kernel/debug/tracing/kprobe_events, etc as part of
this process?
I.e. the process would be, as you describe above plus telling the user
that he/she then should unconpress the script and run it.
Scripts like Brendan's would do whatever they want with it of course,
but the default end result of 'perf probe' would be usable straight away
and doing the build id checking without requiring any extra tooling to
be available at the target machine.
> Added new event:
> probe:do_fork (on do_fork with clone_flags satck_start stack_size parent_tidptr child_tidptr)
> You can now use it in all perf tools, such as:
> perf record -e probe:do_fork -aR sleep 1
> What would you think about this? :)
Cool feature! :-)
> >>> Then, later, one would use 'perf archive' passing some keys (or a
> >>> perf.data file, like done nowadays to pick the files in ~/.debug for
> >>> dsos that had hits on the specified perf.data file) to get the cached
> >>> values to use on some other machine, to avoid having to use the
> >>> debuginfo files there.
> >> Yeah, querying it from the BUILDID database by using a pair of remote
> >> build-id and the binary path is a good feature.
> >>> I.e. in summary I think that the format is ok, but we need to have this
> >>> inside the ~/.debug hierarchy so that we can make sure that we use the
> >>> right probe definition, one that matches the DSOs being used (the kernel
> >>> or some other userspace binary).
> >
> >> OK, perhaps, that is also good to SDT series at last.
> >
> > Sure thing!
> >
>
> Thank you!
You're welcome!
- Arnaldo
(2014/11/04 23:38), Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 04, 2014 at 01:36:31PM +0900, Masami Hiramatsu escreveu:
>> (2014/11/04 1:19), Arnaldo Carvalho de Melo wrote:
>>> Em Mon, Nov 03, 2014 at 09:11:18PM +0900, Masami Hiramatsu escreveu:
>>>> (2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
>>>>> Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
>>>> Actually, kprobe event itself can reject command if the given address
>>>> is not in the kernel text nor instruction boundary (perhaps, uprobes
>>>> may have a problem...), so for the kernel level, it is safe.
>
>>> No, it is not necessarily safe.
>
>>> What if you specify function foo() that has address 0x1234 for kernel
>>> v3.16 and then run the cached probe on kernel v3.18 and on that kernel
>>> the function foo() maps to address 0x2345 and function bar() instead
>>> maps to address 0x1234? Oops.
>
>> In that case user just trace bar() instead of foo(). Of course it's
>> not correct, but shouldn't break the kernel (if the kernel is broken,
>> it is a bug of kprobes).
>
> I.e. no crashes, just misleading information :-\
Right.
>>> The build-id was designed to uniquely identify a DSO, we need to use it.
>
>>> I think that at some point not using it should be left to a, in
>>> systemtap parlance, "guru" mode, with tooling warning profusely when
>>> build ids are not available and requiring even more forcing when it
>>> doesn't matches.
>
>> But it is not necessarily everyone uses perf probe to set up the probe
>> events(because it is a part of ftrace), as we can see in the Brendan's
>> scripts.
>
> Right, If I implied that some particular tool should be used, sorry
> about that, what I wanted to get accross was that the information that
> allows users or tools to make sure there is no mismatch between the
> cached probes and the target kernel is collected at cached probe
> creating time and available at target use time.
Yes, and if user setting probes via perf, the perf must ensure that it
picks up the correct cache by using build-id. If someone wants to use
other tools, he/she must ensure it. We just give a information how to
check that :)
>
>> I think, at least what we need is clarifying how can they ensure
>> build-id before setting the probe events. I'd like to give them options
>> with knowledge instead of forcing by tools.
>
> Right, so we need to have the build-id as part of the cache format,
> perhaps as the first line, starting with a comment (#), that way the
> user can use whatever way he has at its disposal to check that the
> running kernel build-id is the same as the one on that comment. Using
> that script you provided, that uses just things that are on the machine
> (od, /sys/kernel/notes).
Ah, that's a good idea :)
So, with such build-id comment line, would you think we can have an
--output option? Or we'd better moving onto the .debug/ cache file?
What I'm thinking about this feature is to make a compact and reduced
function-entry level probe cache while building the kernel (as a part
of kbuild), so that we can deploy the stripped kernel and the cache
to remote machines.
[snip]
>> OK, I agree using .debug/.buildid/ to store caches.
>> Here is what I'm thinking,
>
>> # this makes caches for given pattern instead of adding probes.
>> perf probe --cache '* $params'
>
>> # the cache is stored in .debug/.buildid/<buildid>.probe
>> # the cache entry can be queried by buildid and eventname
>
> To follow the existing standard this would instead go to:
>
>> # the cache is stored in .debug/probes/path/to/dso/name/buildid
>> # And can be found via its buildid link .debug/.buildid/bu/ildid -> ../../probes/path/to/dso/name/buildid
Ah, I see. so you meant adding a top-level .debug/probes/ dir.
But in that case, shouldn't we change .debug/.buildid/bu/ildid to
.debug/probes/.buildid/bu/ildid ?
>
>> perf probe --query ${remote_buildid}:do_fork
>> p:probe/do_fork _text+298722 clone_flags=%di:u64 stack_start=%si:u64 stack_size=%dx:u64 parent_tidptr=%cx:u64 child_tidptr=%r8:u64
>
>> # or perf can set it up directly to local
>> perf probe --query-add do_fork
>
> You missed the build id above, no? I.e. it would be:
>
>> # or perf can set it up directly to local
>> perf probe --query-add ${remote_buildid}:do_fork
No, since this command set the event to local machine, perf-probe
should check the local build-id and query the appropriate event
from the cache.
# BTW, maybe we'd better use perf probe --add '$do_fork' (calls
# "cache of do_fork") instead of long --query-add. :)
>
> Right? and that would be the equivalent to:
>
> perf probe --query ${remote_buildid}:do_fork > /sys/kernel/debug/tracing/kprobe_events
>
> No?
>
> And do you intend to generate that script above that checks the build
> id, tees into /sys/kernel/debug/tracing/kprobe_events, etc as part of
> this process?
No, not yet. But that sounds nice :) Thanks!
>
> I.e. the process would be, as you describe above plus telling the user
> that he/she then should unconpress the script and run it.
>
> Scripts like Brendan's would do whatever they want with it of course,
> but the default end result of 'perf probe' would be usable straight away
> and doing the build id checking without requiring any extra tooling to
> be available at the target machine.
Agreed, with .debug/ archive, perf-probe should pick and set event
correctly without any other tools.
Thank you!
>
>> Added new event:
>> probe:do_fork (on do_fork with clone_flags satck_start stack_size parent_tidptr child_tidptr)
>
>> You can now use it in all perf tools, such as:
>
>> perf record -e probe:do_fork -aR sleep 1
>
>> What would you think about this? :)
>
> Cool feature! :-)
>
>>>>> Then, later, one would use 'perf archive' passing some keys (or a
>>>>> perf.data file, like done nowadays to pick the files in ~/.debug for
>>>>> dsos that had hits on the specified perf.data file) to get the cached
>>>>> values to use on some other machine, to avoid having to use the
>>>>> debuginfo files there.
>
>>>> Yeah, querying it from the BUILDID database by using a pair of remote
>>>> build-id and the binary path is a good feature.
>
>>>>> I.e. in summary I think that the format is ok, but we need to have this
>>>>> inside the ~/.debug hierarchy so that we can make sure that we use the
>>>>> right probe definition, one that matches the DSOs being used (the kernel
>>>>> or some other userspace binary).
>>>
>>>> OK, perhaps, that is also good to SDT series at last.
>>>
>>> Sure thing!
>>>
>>
>> Thank you!
>
> You're welcome!
>
> - Arnaldo
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Hi Masami,
On Tue, 04 Nov 2014 14:44:28 +0900, Masami Hiramatsu wrote:
> (2014/11/04 12:14), Namhyung Kim wrote:
>> Hi Masami,
>>
>> On Fri, 31 Oct 2014 14:51:29 -0400, Masami Hiramatsu wrote:
>>> So by using all of them, we can make an "event cache" file on all
>>> functions (except for inlined functions) as below.
>>>
>>> # perf probe --max-probes=100000 --no-inlines -a '* $params' -o event.cache
>>>
>>> builds "event.cache" file in which event settings for
>>> all function entries, like below;
>>>
>>> p:probe/reset_early_page_tables _text+12980741
>>> p:probe/copy_bootdata _text+12980830 real_mode_data=%di:u64
>>> p:probe/exit_amd_microcode _text+14692680
>>> p:probe/early_make_pgtable _text+12981274 address=%di:u64
>>> p:probe/x86_64_start_reservations _text+12981700 real_mode_data=%di:u64
>>> p:probe/x86_64_start_kernel _text+12981744 real_mode_data=%di:u64
>>> p:probe/reserve_ebda_region _text+12982117
>>
>> Does this event cache support kernel modules too? AFAIK it can have a
>> different address whenever loaded even on a same kernel.
>
> Yes, for the modules perf probe uses target function symbol directly instead
> of _text.
>
> ----
> perf probe -m xfs -o - -q -a xfs_acl_exists
> p:probe/xfs_acl_exists xfs:xfs_acl_exists+0
> ----
Ah, great! :)
Thanks,
Namhyung
On Wed, 05 Nov 2014 01:22:46 +0900, Masami Hiramatsu wrote:
> (2014/11/04 23:38), Arnaldo Carvalho de Melo wrote:
>> Em Tue, Nov 04, 2014 at 01:36:31PM +0900, Masami Hiramatsu escreveu:
>>> OK, I agree using .debug/.buildid/ to store caches.
>>> Here is what I'm thinking,
>>
>>> # this makes caches for given pattern instead of adding probes.
>>> perf probe --cache '* $params'
>>
>>> # the cache is stored in .debug/.buildid/<buildid>.probe
>>> # the cache entry can be queried by buildid and eventname
>>
>> To follow the existing standard this would instead go to:
>>
>>> # the cache is stored in .debug/probes/path/to/dso/name/buildid
>>> # And can be found via its buildid link .debug/.buildid/bu/ildid -> ../../probes/path/to/dso/name/buildid
>
> Ah, I see. so you meant adding a top-level .debug/probes/ dir.
> But in that case, shouldn't we change .debug/.buildid/bu/ildid to
> .debug/probes/.buildid/bu/ildid ?
Either is fine to me. But my concern is that it might be bloated as
system/package update is going on, so we need to control it somehow.
That's why I suggested the probe-cache command.
>
>>
>>> perf probe --query ${remote_buildid}:do_fork
>>> p:probe/do_fork _text+298722 clone_flags=%di:u64 stack_start=%si:u64 stack_size=%dx:u64 parent_tidptr=%cx:u64 child_tidptr=%r8:u64
>>
>>> # or perf can set it up directly to local
>>> perf probe --query-add do_fork
>>
>> You missed the build id above, no? I.e. it would be:
>>
>>> # or perf can set it up directly to local
>>> perf probe --query-add ${remote_buildid}:do_fork
>
> No, since this command set the event to local machine, perf-probe
> should check the local build-id and query the appropriate event
> from the cache.
> # BTW, maybe we'd better use perf probe --add '$do_fork' (calls
> # "cache of do_fork") instead of long --query-add. :)
It should take care of uprobe case too. So simple do_fork should have
group/event or provider/marker form instead so that it can help to find
which binary defines the cached event. Maybe we also need to keep a
event-to-binary table and then check (current?) build-id somehow to
identify correct event to be used.
Also this function entry level event cache can be used with uprobes..
Thanks,
Namhyung
(2014/11/05 15:23), Namhyung Kim wrote:
> On Wed, 05 Nov 2014 01:22:46 +0900, Masami Hiramatsu wrote:
>> (2014/11/04 23:38), Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Nov 04, 2014 at 01:36:31PM +0900, Masami Hiramatsu escreveu:
>>>> OK, I agree using .debug/.buildid/ to store caches.
>>>> Here is what I'm thinking,
>>>
>>>> # this makes caches for given pattern instead of adding probes.
>>>> perf probe --cache '* $params'
>>>
>>>> # the cache is stored in .debug/.buildid/<buildid>.probe
>>>> # the cache entry can be queried by buildid and eventname
>>>
>>> To follow the existing standard this would instead go to:
>>>
>>>> # the cache is stored in .debug/probes/path/to/dso/name/buildid
>>>> # And can be found via its buildid link .debug/.buildid/bu/ildid -> ../../probes/path/to/dso/name/buildid
>>
>> Ah, I see. so you meant adding a top-level .debug/probes/ dir.
>> But in that case, shouldn't we change .debug/.buildid/bu/ildid to
>> .debug/probes/.buildid/bu/ildid ?
>
> Either is fine to me. But my concern is that it might be bloated as
> system/package update is going on, so we need to control it somehow.
> That's why I suggested the probe-cache command.
Yes, that is also my question. Should we really have 3 cache management
commands? At the first step, maybe we can have those as separated,
but it would better be consolidated to one perf-cache command, I think.
>>>> perf probe --query ${remote_buildid}:do_fork
>>>> p:probe/do_fork _text+298722 clone_flags=%di:u64 stack_start=%si:u64 stack_size=%dx:u64 parent_tidptr=%cx:u64 child_tidptr=%r8:u64
>>>
>>>> # or perf can set it up directly to local
>>>> perf probe --query-add do_fork
>>>
>>> You missed the build id above, no? I.e. it would be:
>>>
>>>> # or perf can set it up directly to local
>>>> perf probe --query-add ${remote_buildid}:do_fork
>>
>> No, since this command set the event to local machine, perf-probe
>> should check the local build-id and query the appropriate event
>> from the cache.
>> # BTW, maybe we'd better use perf probe --add '$do_fork' (calls
>> # "cache of do_fork") instead of long --query-add. :)
>
> It should take care of uprobe case too. So simple do_fork should have
> group/event or provider/marker form instead so that it can help to find
> which binary defines the cached event. Maybe we also need to keep a
> event-to-binary table and then check (current?) build-id somehow to
> identify correct event to be used.
>
Good catch! :)
perf probe --add '$group:event $params' # for cached event
perf probe --add '%provider[@path]:marker $params' # for SDT event
and these are automatically check the build-id.
> Also this function entry level event cache can be used with uprobes..
Yes, of course :)
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Em Wed, Nov 05, 2014 at 01:22:46AM +0900, Masami Hiramatsu escreveu:
> (2014/11/04 23:38), Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 04, 2014 at 01:36:31PM +0900, Masami Hiramatsu escreveu:
> >> (2014/11/04 1:19), Arnaldo Carvalho de Melo wrote:
> >>> Em Mon, Nov 03, 2014 at 09:11:18PM +0900, Masami Hiramatsu escreveu:
> >>>> (2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
> >>>>> Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
> >>>> Actually, kprobe event itself can reject command if the given address
> >>>> is not in the kernel text nor instruction boundary (perhaps, uprobes
> >>>> may have a problem...), so for the kernel level, it is safe.
> >
> >>> No, it is not necessarily safe.
> >
> >>> What if you specify function foo() that has address 0x1234 for kernel
> >>> v3.16 and then run the cached probe on kernel v3.18 and on that kernel
> >>> the function foo() maps to address 0x2345 and function bar() instead
> >>> maps to address 0x1234? Oops.
> >> In that case user just trace bar() instead of foo(). Of course it's
> >> not correct, but shouldn't break the kernel (if the kernel is broken,
> >> it is a bug of kprobes).
> > I.e. no crashes, just misleading information :-\
> Right.
The kernel doesn't crashes, just the user, after scratching his head
trying to make sense of wrong information :-P
Digressing: is there some kprobe_tracer_fuzzer out there?
> >>> The build-id was designed to uniquely identify a DSO, we need to use it.
> >>> I think that at some point not using it should be left to a, in
> >>> systemtap parlance, "guru" mode, with tooling warning profusely when
> >>> build ids are not available and requiring even more forcing when it
> >>> doesn't matches.
> >> But it is not necessarily everyone uses perf probe to set up the probe
> >> events(because it is a part of ftrace), as we can see in the Brendan's
> >> scripts.
> > Right, If I implied that some particular tool should be used, sorry
> > about that, what I wanted to get accross was that the information that
> > allows users or tools to make sure there is no mismatch between the
> > cached probes and the target kernel is collected at cached probe
> > creating time and available at target use time.
> Yes, and if user setting probes via perf, the perf must ensure that it
> picks up the correct cache by using build-id. If someone wants to use
> other tools, he/she must ensure it. We just give a information how to
> check that :)
> >> I think, at least what we need is clarifying how can they ensure
> >> build-id before setting the probe events. I'd like to give them options
> >> with knowledge instead of forcing by tools.
> > Right, so we need to have the build-id as part of the cache format,
> > perhaps as the first line, starting with a comment (#), that way the
> > user can use whatever way he has at its disposal to check that the
> > running kernel build-id is the same as the one on that comment. Using
> > that script you provided, that uses just things that are on the machine
> > (od, /sys/kernel/notes).
> Ah, that's a good idea :)
> So, with such build-id comment line, would you think we can have an
> --output option? Or we'd better moving onto the .debug/ cache file?
Well, these are two separate things: The ./debug cache file is for
repeated use of the same probe on the same machine, say, across reboots
or in tools that script using perf probe to add some probe, then remove
the probes at exit, using it multiple times would provide a after
caching, as no ELF DWARF parsing would be involved after we cached it.
As well for when using 'perf archive' to reuse the probe definitions
using 'perf probe' on the target machine, where no -debuginfo packages
(or binaries with DWARF) would be available.
I.e. both having the comment on the cache file with the build id and
having it stored in ~/.debug/ have values for different workloads or
different ways of doing the same thing.
> What I'm thinking about this feature is to make a compact and reduced
> function-entry level probe cache while building the kernel (as a part
> of kbuild), so that we can deploy the stripped kernel and the cache
> to remote machines.
Which is a cool feature as well! :-)
> [snip]
> >> OK, I agree using .debug/.buildid/ to store caches.
> >> Here is what I'm thinking,
> >
> >> # this makes caches for given pattern instead of adding probes.
> >> perf probe --cache '* $params'
> >
> >> # the cache is stored in .debug/.buildid/<buildid>.probe
> >> # the cache entry can be queried by buildid and eventname
> >
> > To follow the existing standard this would instead go to:
> >
> >> # the cache is stored in .debug/probes/path/to/dso/name/buildid
> >> # And can be found via its buildid link .debug/.buildid/bu/ildid -> ../../probes/path/to/dso/name/buildid
>
> Ah, I see. so you meant adding a top-level .debug/probes/ dir.
> But in that case, shouldn't we change .debug/.buildid/bu/ildid to
> .debug/probes/.buildid/bu/ildid ?
Humm, understood, perhaps we should have a subdir for probes, i.e.:
# And can be found via its buildid link .debug/.buildid/probes/bu/ildid -> ../../probes/path/to/dso/name/probes/buildid
So as not to clash in both cases with the ELF file.
Which leads to a problem when we have both vmlinux and kallsyms, which,
right now, is not possible, i.e. if we add both a vmlinux and a kallsyms
file for a kernel, the one that comes last will have the
.debug/.buildid/bu/ildid
link, have to fix that so that we first look for the ELF file in the
cache, and if it fails, fallback to the kallsyms file, if available,
i.e. give preference to the richer option, and in some cases, like for
annotation, the only one that matters.
> >> perf probe --query ${remote_buildid}:do_fork
> >> p:probe/do_fork _text+298722 clone_flags=%di:u64 stack_start=%si:u64 stack_size=%dx:u64 parent_tidptr=%cx:u64 child_tidptr=%r8:u64
> >
> >> # or perf can set it up directly to local
> >> perf probe --query-add do_fork
> >
> > You missed the build id above, no? I.e. it would be:
> >
> >> # or perf can set it up directly to local
> >> perf probe --query-add ${remote_buildid}:do_fork
> No, since this command set the event to local machine, perf-probe
> should check the local build-id and query the appropriate event
> from the cache.
> # BTW, maybe we'd better use perf probe --add '$do_fork' (calls
> # "cache of do_fork") instead of long --query-add. :)
Ok, understood.
> >
> > Right? and that would be the equivalent to:
> >
> > perf probe --query ${remote_buildid}:do_fork > /sys/kernel/debug/tracing/kprobe_events
> >
> > No?
> >
> > And do you intend to generate that script above that checks the build
> > id, tees into /sys/kernel/debug/tracing/kprobe_events, etc as part of
> > this process?
>
> No, not yet. But that sounds nice :) Thanks!
>
> >
> > I.e. the process would be, as you describe above plus telling the user
> > that he/she then should unconpress the script and run it.
> >
> > Scripts like Brendan's would do whatever they want with it of course,
> > but the default end result of 'perf probe' would be usable straight away
> > and doing the build id checking without requiring any extra tooling to
> > be available at the target machine.
>
> Agreed, with .debug/ archive, perf-probe should pick and set event
> correctly without any other tools.
>
> Thank you!
>
> >
> >> Added new event:
> >> probe:do_fork (on do_fork with clone_flags satck_start stack_size parent_tidptr child_tidptr)
> >
> >> You can now use it in all perf tools, such as:
> >
> >> perf record -e probe:do_fork -aR sleep 1
> >
> >> What would you think about this? :)
> >
> > Cool feature! :-)
> >
> >>>>> Then, later, one would use 'perf archive' passing some keys (or a
> >>>>> perf.data file, like done nowadays to pick the files in ~/.debug for
> >>>>> dsos that had hits on the specified perf.data file) to get the cached
> >>>>> values to use on some other machine, to avoid having to use the
> >>>>> debuginfo files there.
> >
> >>>> Yeah, querying it from the BUILDID database by using a pair of remote
> >>>> build-id and the binary path is a good feature.
> >
> >>>>> I.e. in summary I think that the format is ok, but we need to have this
> >>>>> inside the ~/.debug hierarchy so that we can make sure that we use the
> >>>>> right probe definition, one that matches the DSOs being used (the kernel
> >>>>> or some other userspace binary).
> >>>
> >>>> OK, perhaps, that is also good to SDT series at last.
> >>>
> >>> Sure thing!
> >>>
> >>
> >> Thank you!
> >
> > You're welcome!
> >
> > - Arnaldo
> >
>
>
> --
> Masami HIRAMATSU
> Software Platform Research Dept. Linux Technology Research Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: [email protected]
>
(2014/11/05 22:04), Arnaldo Carvalho de Melo wrote:
> Em Wed, Nov 05, 2014 at 01:22:46AM +0900, Masami Hiramatsu escreveu:
>> (2014/11/04 23:38), Arnaldo Carvalho de Melo wrote:
>>> Em Tue, Nov 04, 2014 at 01:36:31PM +0900, Masami Hiramatsu escreveu:
>>>> (2014/11/04 1:19), Arnaldo Carvalho de Melo wrote:
>>>>> Em Mon, Nov 03, 2014 at 09:11:18PM +0900, Masami Hiramatsu escreveu:
>>>>>> (2014/10/31 21:13), Arnaldo Carvalho de Melo wrote:
>>>>>>> Em Fri, Oct 31, 2014 at 02:51:29PM -0400, Masami Hiramatsu escreveu:
>>>>>> Actually, kprobe event itself can reject command if the given address
>>>>>> is not in the kernel text nor instruction boundary (perhaps, uprobes
>>>>>> may have a problem...), so for the kernel level, it is safe.
>>>
>>>>> No, it is not necessarily safe.
>>>
>>>>> What if you specify function foo() that has address 0x1234 for kernel
>>>>> v3.16 and then run the cached probe on kernel v3.18 and on that kernel
>>>>> the function foo() maps to address 0x2345 and function bar() instead
>>>>> maps to address 0x1234? Oops.
>
>>>> In that case user just trace bar() instead of foo(). Of course it's
>>>> not correct, but shouldn't break the kernel (if the kernel is broken,
>>>> it is a bug of kprobes).
>
>>> I.e. no crashes, just misleading information :-\
>
>> Right.
>
> The kernel doesn't crashes, just the user, after scratching his head
> trying to make sense of wrong information :-P
>
> Digressing: is there some kprobe_tracer_fuzzer out there?
No, but it's a good idea:) I just used to more systematic one
(only for the function entry).
>>>>> The build-id was designed to uniquely identify a DSO, we need to use it.
>
>>>>> I think that at some point not using it should be left to a, in
>>>>> systemtap parlance, "guru" mode, with tooling warning profusely when
>>>>> build ids are not available and requiring even more forcing when it
>>>>> doesn't matches.
>
>>>> But it is not necessarily everyone uses perf probe to set up the probe
>>>> events(because it is a part of ftrace), as we can see in the Brendan's
>>>> scripts.
>
>>> Right, If I implied that some particular tool should be used, sorry
>>> about that, what I wanted to get accross was that the information that
>>> allows users or tools to make sure there is no mismatch between the
>>> cached probes and the target kernel is collected at cached probe
>>> creating time and available at target use time.
>
>> Yes, and if user setting probes via perf, the perf must ensure that it
>> picks up the correct cache by using build-id. If someone wants to use
>> other tools, he/she must ensure it. We just give a information how to
>> check that :)
>
>>>> I think, at least what we need is clarifying how can they ensure
>>>> build-id before setting the probe events. I'd like to give them options
>>>> with knowledge instead of forcing by tools.
>
>>> Right, so we need to have the build-id as part of the cache format,
>>> perhaps as the first line, starting with a comment (#), that way the
>>> user can use whatever way he has at its disposal to check that the
>>> running kernel build-id is the same as the one on that comment. Using
>>> that script you provided, that uses just things that are on the machine
>>> (od, /sys/kernel/notes).
>
>> Ah, that's a good idea :)
>> So, with such build-id comment line, would you think we can have an
>> --output option? Or we'd better moving onto the .debug/ cache file?
>
> Well, these are two separate things: The ./debug cache file is for
> repeated use of the same probe on the same machine, say, across reboots
> or in tools that script using perf probe to add some probe, then remove
> the probes at exit, using it multiple times would provide a after
> caching, as no ELF DWARF parsing would be involved after we cached it.
OK, but it would be the next step. At first, I'd like to add/use cache
explicitly.
> As well for when using 'perf archive' to reuse the probe definitions
> using 'perf probe' on the target machine, where no -debuginfo packages
> (or binaries with DWARF) would be available.
>
> I.e. both having the comment on the cache file with the build id and
> having it stored in ~/.debug/ have values for different workloads or
> different ways of doing the same thing.
OK, I'lll add a build-id comment line then.
>> What I'm thinking about this feature is to make a compact and reduced
>> function-entry level probe cache while building the kernel (as a part
>> of kbuild), so that we can deploy the stripped kernel and the cache
>> to remote machines.
>
> Which is a cool feature as well! :-)
Thanks :)
>
>> [snip]
>>>> OK, I agree using .debug/.buildid/ to store caches.
>>>> Here is what I'm thinking,
>>>
>>>> # this makes caches for given pattern instead of adding probes.
>>>> perf probe --cache '* $params'
>>>
>>>> # the cache is stored in .debug/.buildid/<buildid>.probe
>>>> # the cache entry can be queried by buildid and eventname
>>>
>>> To follow the existing standard this would instead go to:
>>>
>>>> # the cache is stored in .debug/probes/path/to/dso/name/buildid
>>>> # And can be found via its buildid link .debug/.buildid/bu/ildid -> ../../probes/path/to/dso/name/buildid
>>
>> Ah, I see. so you meant adding a top-level .debug/probes/ dir.
>> But in that case, shouldn't we change .debug/.buildid/bu/ildid to
>> .debug/probes/.buildid/bu/ildid ?
>
> Humm, understood, perhaps we should have a subdir for probes, i.e.:
>
> # And can be found via its buildid link .debug/.buildid/probes/bu/ildid -> ../../probes/path/to/dso/name/probes/buildid
>
> So as not to clash in both cases with the ELF file.
This looks good to me.
> Which leads to a problem when we have both vmlinux and kallsyms, which,
> right now, is not possible, i.e. if we add both a vmlinux and a kallsyms
> file for a kernel, the one that comes last will have the
>
> .debug/.buildid/bu/ildid
>
> link, have to fix that so that we first look for the ELF file in the
> cache, and if it fails, fallback to the kallsyms file, if available,
> i.e. give preference to the richer option, and in some cases, like for
> annotation, the only one that matters.
OK, but this seems the problem for buildid-cache, not the probe-cache nor
sdt-cache (since both should have same contents).
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]