2013-10-23 05:04:31

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v4 0/3] perf support to SDT markers

This patchset helps in probing dtrace style markers(SDT) present in user space
applications through perf. Notes/markers are placed at important places by the
developers. They have a negligible overhead when not enabled. We can enable
them and probe at these places and find some important information like the
arguments' values, etc.

How to add SDT markers into user applications:
We need to have this header sys/sdt.h present.
sys/sdt.h used is version 3.
If not present, install systemtap-sdt-devel package (for fedora-18).

A very simple example to show this :
$ cat user_app.c

#include <sys/sdt.h>

void main () {
/* ... */
/*
* user_app is the provider name
* test_probe is the marker name
*/
STAP_PROBE(user_app, test_mark);
/* ... */
}

$ gcc user_app.c
$ perf probe -M -x ./a.out
%user_app:test_mark

A different example to show the same :
- Create a file with .d extension and mention the probe names in it with
provider name and marker name.

$ cat probes.d
provider user_app {
probe foo_start();
probe fun_start();
};

- Now create the probes.h and probes.o file :
$ dtrace -C -h -s probes.d -o probes.h
$ dtrace -C -G -s probes.d -o probes.o

- A program using the markers:

$ cat user_app.c

#include <stdio.h>
#include "probes.h"

void foo(void)
{
USER_APP_FOO_START();
printf("This is foo\n");
}

void fun(void)
{
USER_APP_FUN_START();
printf("Inside fun\n");
}
int main(void)
{
printf("In main\n");
foo();
fun();
return 0;
}

- Compile it and also provide probes.o file to linker:
$ gcc user_app.c probes.o -o user_app

- Now use perf to list the markers in the app:
# perf probe --markers -x ./user_app

%user_app:foo_start
%user_app:fun_start

- And then use perf probe to add a probe point :

# perf probe -x ./user_app -a '%user_app:foo_start'

Added new event :
event = foo_start (on 0x530)

You can now use it on all perf tools such as :

perf record -e user_app:foo_start -aR sleep 1

# perf record -e user_app:foo_start -aR ./user_app
In main
This is foo
Inside fun
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.235 MB perf.data (~10279 samples) ]

- Then use perf tools to analyze it.
# perf report --stdio

# ========
# captured on: Tue Sep 3 16:19:55 2013
# hostname : hemant-fedora
# os release : 3.11.0-rc3+
# perf version : 3.9.4-200.fc18.x86_64
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : QEMU Virtual CPU version 1.2.2
# cpuid : GenuineIntel,6,2,3
# total memory : 2051912 kBIf these are not enabled, they are present in the \
ELF as nop.

# cmdline : /usr/bin/perf record -e probe_user:foo_start -aR ./user_app
# event : name = probe_user:foo_start, type = 2, config = 0x38e, config1
= 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
excl_guest = 1, precise_ip = 0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: software = 1, tracepoint = 2, breakpoint = 5
# ========
#
# Samples: 1 of event 'probe_user:foo_start'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ........ ............. .......
#
100.00% user_app user_app [.] foo


#
# (For a higher level overview, try: perf report --sort comm,dso)
#

We can see and probe the existing markers in libc (if present) :
$ perf probe --markers -x /lib64/libc.so.6

%libc:setjmp
%libc:longjmp
%libc:longjmp_target
%libc:lll_futex_wake
%libc:lll_lock_wait_private
%libc:longjmp
%libc:longjmp_target
%libc:lll_futex_wake

This link shows an example of marker probing with Systemtap:
https://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps

Also, this link provides important info regarding SDT notes:
http://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation

- Markers in binaries :
These SDT markers are present in the ELF in the section named
".note.stapsdt".
Here, the name of the marker, its provider, type, location, base
address, semaphore address.
We can retrieve these values using the members name_off and desc_off in
Nhdr structure. If these are not enabled, they are present in the ELF as nop.

All the above info is moved in sdt-probes.txt file present in the Documentation
directory in tools/perf/.

Changes since last version:
- The interface search_sdt_note() has been changed and 'list' argument has been
removed.
- list_empty() check is now done in print_sdt_note_info() and
cleanup_sdt_note_info().
- Some small improvements have been made.

TODO:
- Recognizing arguments and support to probe on them.
---

Hemant Kumar (3):
SDT markers listing by perf:
Support for perf to probe into SDT markers:
Documentation regarding perf/sdt


tools/perf/Documentation/perf-probe.txt | 17 ++
tools/perf/Documentation/sdt-probes.txt | 184 ++++++++++++++++++
tools/perf/builtin-probe.c | 45 ++++-
tools/perf/util/probe-event.c | 125 ++++++++++++-
tools/perf/util/probe-event.h | 3
tools/perf/util/symbol-elf.c | 309 +++++++++++++++++++++++++++++++
tools/perf/util/symbol.h | 24 ++
7 files changed, 693 insertions(+), 14 deletions(-)
create mode 100644 tools/perf/Documentation/sdt-probes.txt

--


2013-10-23 05:05:11

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v4 1/3] SDT markers listing by perf:

This patch will enable perf to list all the sdt markers present
in an elf file. The markers are present in the .note.stapsdt section
of the elf. We can traverse through this section and collect the
required info about the markers.
We can use '-M/--markers' with perf to view the SDT notes.

Currently, the sdt notes which have their semaphores enabled, are being
ignored silently. But, they will be supported soon.

Wrapping this inside #ifdef LIBELF_SUPPORT pair is not required,
because, if NO_LIBELF = 1, then 'probe' command of perf is itself disabled.

Usage:
perf probe --markers -x /lib64/libc.so.6

Output :
%libc:setjmp
%libc:longjmp
%libc:longjmp_target
%libc:lll_futex_wake
%libc:lll_lock_wait_private
%libc:longjmp
%libc:longjmp_target
%libc:lll_futex_wake

Signed-off-by: Hemant Kumar Shaw <[email protected]>
---
tools/perf/builtin-probe.c | 41 +++++++
tools/perf/util/probe-event.c | 23 ++++
tools/perf/util/probe-event.h | 1
tools/perf/util/symbol-elf.c | 225 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/symbol.h | 19 +++
5 files changed, 307 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 89acc17..2450613 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -55,6 +55,8 @@ static struct {
bool show_funcs;
bool mod_events;
bool uprobes;
+ bool exec;
+ bool sdt;
int nevents;
struct perf_probe_event events[MAX_PROBES];
struct strlist *dellist;
@@ -171,8 +173,10 @@ static int opt_set_target(const struct option *opt, const char *str,
int ret = -ENOENT;

if (str && !params.target) {
- if (!strcmp(opt->long_name, "exec"))
+ if (!strcmp(opt->long_name, "exec")) {
params.uprobes = true;
+ params.exec = true;
+ }
#ifdef HAVE_DWARF_SUPPORT
else if (!strcmp(opt->long_name, "module"))
params.uprobes = false;
@@ -325,6 +329,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
opt_set_filter),
OPT_CALLBACK('x', "exec", NULL, "executable|path",
"target executable name or path", opt_set_target),
+ OPT_BOOLEAN('M', "markers", &params.sdt, "Show proba-able sdt notes"),
OPT_END()
};
int ret;
@@ -347,7 +352,7 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
params.max_probe_points = MAX_PROBES;

if ((!params.nevents && !params.dellist && !params.list_events &&
- !params.show_lines && !params.show_funcs))
+ !params.show_lines && !params.show_funcs && !params.sdt))
usage_with_options(probe_usage, options);

/*
@@ -355,6 +360,38 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
*/
symbol_conf.try_vmlinux_path = (symbol_conf.vmlinux_name == NULL);

+ if (params.sdt) {
+ if (params.show_lines) {
+ pr_err("Error: Don't use --markers with --lines.\n");
+ usage_with_options(probe_usage, options);
+ }
+ if (params.show_vars) {
+ pr_err("Error: Don't use --markers with --vars.\n");
+ usage_with_options(probe_usage, options);
+ }
+ if (params.show_funcs) {
+ pr_err("Error: Don't use --markers with --funcs.\n");
+ usage_with_options(probe_usage, options);
+ }
+ if (params.mod_events) {
+ pr_err("Error: Don't use --markers with --add/--del.\n");
+ usage_with_options(probe_usage, options);
+ }
+ if (!params.exec) {
+ pr_err("Error: Always use --exec with --markers.\n");
+ usage_with_options(probe_usage, options);
+ }
+ if (!params.target) {
+ pr_err("Error: Please specify a target binary!\n");
+ usage_with_options(probe_usage, options);
+ }
+ ret = show_sdt_notes(params.target);
+ if (ret < 0) {
+ pr_err(" Error : Failed to find SDT markers in %s !"
+ " (%d)\n", params.target, ret);
+ }
+ return ret;
+ }
if (params.list_events) {
if (params.mod_events) {
pr_err(" Error: Don't use --list with --add/--del.\n");
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 779b2da..19182f7 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -2372,3 +2372,26 @@ out:
free(name);
return ret;
}
+
+static void display_sdt_note_info(struct list_head *start)
+{
+ struct sdt_note *pos;
+
+ if (list_empty(start))
+ return;
+ list_for_each_entry(pos, start, note_list) {
+ printf("%%%s:%s\n", pos->provider, pos->name);
+ }
+}
+
+int show_sdt_notes(const char *target)
+{
+ int ret;
+ LIST_HEAD(sdt_notes);
+
+ ret = get_sdt_note_list(&sdt_notes, target);
+ if (!ret)
+ display_sdt_note_info(&sdt_notes);
+ cleanup_sdt_note_list(&sdt_notes);
+ return ret;
+}
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index f9f3de8..32de5a3 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -133,6 +133,7 @@ extern int show_available_vars(struct perf_probe_event *pevs, int npevs,
struct strfilter *filter, bool externs);
extern int show_available_funcs(const char *module, struct strfilter *filter,
bool user);
+int show_sdt_notes(const char *target);

/* Maximum index number of event-name postfix */
#define MAX_EVENT_INDEX 1024
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index eed0b96..a065b04 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1613,6 +1613,231 @@ void kcore_extract__delete(struct kcore_extract *kce)
unlink(kce->extract_filename);
}

+/*
+ * Populate the name, type, offset in the SDT note structure and
+ * ignore the argument fields (for now)
+ */
+static int populate_sdt_note(Elf **elf, const char *data, size_t len, int type,
+ struct sdt_note **note)
+{
+ const char *provider, *name;
+ struct sdt_note *tmp = NULL;
+ int ret = -1;
+
+ /*
+ * Three addresses need to be obtained :
+ * Marker location, address of base section and semaphore location
+ */
+ union {
+ Elf64_Addr a64[3];
+ Elf32_Addr a32[3];
+ } buf;
+
+ /*
+ * dst and src are required for translation from file to memory
+ * representation
+ */
+ Elf_Data dst = {
+ .d_buf = &buf, .d_type = ELF_T_ADDR, .d_version = EV_CURRENT,
+ .d_size = gelf_fsize((*elf), ELF_T_ADDR, 3, EV_CURRENT),
+ .d_off = 0, .d_align = 0
+ };
+
+ Elf_Data src = {
+ .d_buf = (void *) data, .d_type = ELF_T_ADDR,
+ .d_version = EV_CURRENT, .d_size = dst.d_size, .d_off = 0,
+ .d_align = 0
+ };
+
+ /* Check the type of each of the notes */
+ if (type != SDT_NOTE_TYPE)
+ goto out_err;
+
+ tmp = (struct sdt_note *)zalloc(sizeof(struct sdt_note));
+ if (tmp == NULL) {
+ ret = -ENOMEM;
+ goto out_err;
+ }
+ INIT_LIST_HEAD(&tmp->note_list);
+
+ if (len < dst.d_size + 3)
+ goto out_free_note;
+
+ /* Translation from file representation to memory representation */
+ if (gelf_xlatetom(*elf, &dst, &src,
+ elf_getident(*elf, NULL)[EI_DATA]) == NULL)
+ pr_debug("gelf_xlatetom : %s", elf_errmsg(-1));
+
+ /* Populate the fields of sdt_note */
+ provider = data + dst.d_size;
+
+ name = (const char *)memchr(provider, '\0', data + len - provider);
+ if (name++ == NULL)
+ goto out_free_note;
+ tmp->provider = strdup(provider);
+ if (!tmp->provider) {
+ ret = -ENOMEM;
+ goto out_free_note;
+ }
+ tmp->name = strdup(name);
+ if (!tmp->name) {
+ ret = -ENOMEM;
+ goto out_free_prov;
+ }
+
+ /* Obtain the addresses and ignore notes with semaphores set*/
+ if (gelf_getclass(*elf) == ELFCLASS32) {
+ if (buf.a32[2] != 0)
+ goto out_free_name;
+ tmp->addr.a32[0] = buf.a32[0];
+ tmp->addr.a32[1] = buf.a32[1];
+ tmp->addr.a32[2] = buf.a32[2];
+ tmp->bit32 = true;
+ } else {
+ if (buf.a64[2] != 0)
+ goto out_free_name;
+ tmp->addr.a64[0] = buf.a64[0];
+ tmp->addr.a64[1] = buf.a64[1];
+ tmp->addr.a64[2] = buf.a64[2];
+ tmp->bit32 = false;
+ }
+ *note = tmp;
+ return 0;
+
+out_free_name:
+ free(tmp->name);
+out_free_prov:
+ free(tmp->provider);
+out_free_note:
+ free(tmp);
+out_err:
+ return ret;
+}
+
+static int construct_sdt_notes_list(Elf *elf, struct list_head *sdt_notes)
+{
+ GElf_Ehdr ehdr;
+ Elf_Scn *scn = NULL;
+ Elf_Data *data;
+ GElf_Shdr shdr;
+ size_t shstrndx;
+ size_t next;
+ GElf_Nhdr nhdr;
+ size_t name_off, desc_off, offset;
+ struct sdt_note *tmp = NULL;
+ int ret = 0, val = 0;
+
+ if (gelf_getehdr(elf, &ehdr) == NULL) {
+ ret = -EBADF;
+ pr_debug("Can't get elf header!");
+ goto out_ret;
+ }
+ if (elf_getshdrstrndx(elf, &shstrndx) != 0) {
+ ret = -EBADF;
+ pr_debug("getshdrstrndx failed\n");
+ goto out_ret;
+ }
+
+ /*
+ * Look for section type = SHT_NOTE, flags = no SHF_ALLOC
+ * and name = .note.stapsdt
+ */
+ scn = elf_section_by_name(elf, &ehdr, &shdr, SDT_NOTE_SCN, NULL);
+ if (!scn) {
+ ret = -ENOENT;
+ pr_debug("Can't get section .note.stapsdt\n");
+ goto out_ret;
+ }
+ if (!(shdr.sh_type == SHT_NOTE) || (shdr.sh_flags & SHF_ALLOC)) {
+ ret = -ENOENT;
+ goto out_ret;
+ }
+
+ data = elf_getdata(scn, NULL);
+
+ /* Get the SDT notes */
+ for (offset = 0; (next = gelf_getnote(data, offset, &nhdr, &name_off,
+ &desc_off)) > 0; offset = next) {
+ if (nhdr.n_namesz == sizeof(SDT_NOTE_NAME) &&
+ !memcmp(data->d_buf + name_off, SDT_NOTE_NAME,
+ sizeof(SDT_NOTE_NAME))) {
+ val = populate_sdt_note(&elf, ((data->d_buf) + desc_off),
+ nhdr.n_descsz, nhdr.n_type,
+ &tmp);
+ if (!val)
+ list_add_tail(&tmp->note_list, sdt_notes);
+ if (val == -ENOMEM) {
+ ret = -ENOMEM;
+ goto out_ret;
+ }
+ }
+ }
+ if (!sdt_notes)
+ ret = -ENOENT;
+
+out_ret:
+ return ret;
+}
+
+static int sdt_err(int val, const char *target)
+{
+ switch (-val) {
+ case 0:
+ break;
+ case ENOENT:
+ /* Absence of SDT markers isn't an error */
+ val = 0;
+ printf("%s : No SDT markers found!\n", target);
+ break;
+ case EBADF:
+ pr_err("%s : Bad file name\n", target);
+ break;
+ default:
+ pr_err("%s\n", strerror(val));
+ }
+ return val;
+}
+
+int get_sdt_note_list(struct list_head *head, const char *target)
+{
+ Elf *elf;
+ int fd, ret;
+
+ fd = open(target, O_RDONLY);
+ if (fd < 0) {
+ pr_err("%s : %s\n", target, strerror(errno));
+ return -errno;
+ }
+
+ symbol__elf_init();
+ elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+ if (!elf) {
+ ret = -EBADF;
+ pr_debug("%s : %s\n", target, elf_errmsg(elf_errno()));
+ goto out_close;
+ }
+ ret = construct_sdt_notes_list(elf, head);
+ elf_end(elf);
+
+out_close:
+ close(fd);
+ return sdt_err(ret, target);
+}
+
+void cleanup_sdt_note_list(struct list_head *sdt_notes)
+{
+ struct sdt_note *tmp, *pos;
+
+ if (list_empty(sdt_notes))
+ return;
+ list_for_each_entry_safe(pos, tmp, sdt_notes, note_list) {
+ list_del(&pos->note_list);
+ free(pos->name);
+ free(pos->provider);
+ free(pos);
+ }
+}
+
void symbol__elf_init(void)
{
elf_version(EV_CURRENT);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 07de8fe..1e50300 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -198,6 +198,17 @@ struct symsrc {
#endif
};

+struct sdt_note {
+ char *name;
+ char *provider;
+ bool bit32; /* 32 or 64 bit flag */
+ union {
+ Elf64_Addr a64[3];
+ Elf32_Addr a32[3];
+ } addr;
+ struct list_head note_list;
+};
+
void symsrc__destroy(struct symsrc *ss);
int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
enum dso_binary_type type);
@@ -273,4 +284,12 @@ void kcore_extract__delete(struct kcore_extract *kce);
int kcore_copy(const char *from_dir, const char *to_dir);
int compare_proc_modules(const char *from, const char *to);

+/* Specific to SDT notes */
+int get_sdt_note_list(struct list_head *head, const char *target);
+void cleanup_sdt_note_list(struct list_head *sdt_notes);
+
+#define SDT_NOTE_TYPE 3
+#define SDT_NOTE_SCN ".note.stapsdt"
+#define SDT_NOTE_NAME "stapsdt"
+
#endif /* __PERF_SYMBOL */

2013-10-23 05:05:30

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v4 2/3] Support for perf to probe into SDT markers:

This allows perf to probe into the sdt markers/notes present in
the libraries and executables. We try to find the associated location
and handle prelinking (since, stapsdt notes section is not allocated
during runtime). Prelinking is handled with the help of base
section which is allocated during runtime. This address can be compared
with the address retrieved from the notes' description. If its different,
we can take this difference and then add to the note's location.

We can use existing '-a/--add' option to add events for sdt markers.
Also, we can add multiple events at once using the same '-a' option.

Usage:
perf probe -x /lib64/libc.so.6 -a 'my_event=%libc:setjmp'

Output:
Added new event:
libc:my_event (on 0x35981)

You can now use it in all perf tools, such as:

perf record -e libc:my_event -aR sleep 1


Signed-off-by: Hemant Kumar Shaw <[email protected]>
---
tools/perf/builtin-probe.c | 4 ++
tools/perf/util/probe-event.c | 102 +++++++++++++++++++++++++++++++++++++----
tools/perf/util/probe-event.h | 2 +
tools/perf/util/symbol-elf.c | 84 ++++++++++++++++++++++++++++++++++
tools/perf/util/symbol.h | 5 ++
5 files changed, 187 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 2450613..8f0dd48 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -494,6 +494,10 @@ int cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
}

if (params.nevents) {
+ if (params.events->sdt && !params.target && !params.exec) {
+ pr_err("SDT markers can be probed only with --exec.\n");
+ usage_with_options(probe_usage, options);
+ }
ret = add_perf_probe_events(params.events, params.nevents,
params.max_probe_points,
params.target,
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 19182f7..508c7a2 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -816,6 +816,40 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
pev->group = NULL;
arg = tmp;
}
+ /* Check for SDT marker */
+ if (*arg == '%') {
+ ptr = strchr(++arg, ':');
+ if (!ptr) {
+ semantic_error("Provider name must follow an event "
+ "name\n");
+ return -EINVAL;
+ }
+ *ptr++ = '\0';
+ tmp = strdup(arg);
+ if (!tmp)
+ return -ENOMEM;
+
+ pev->point.note = (struct sdt_note *)
+ zalloc(sizeof(struct sdt_note));
+ if (!pev->point.note) {
+ free(tmp);
+ return -ENOMEM;
+ }
+ pev->point.note->provider = tmp;
+
+ tmp = strdup(ptr);
+ if (!tmp) {
+ free(pev->point.note->provider);
+ free(pev->point.note);
+ return -ENOMEM;
+ }
+ pev->point.note->name = tmp;
+ pev->group = pev->point.note->provider;
+ if (!pev->event)
+ pev->event = pev->point.note->name;
+ pev->sdt = true;
+ return 0;
+ }

ptr = strpbrk(arg, ";:+@%");
if (ptr) {
@@ -1238,6 +1272,7 @@ static char *synthesize_perf_probe_point(struct perf_probe_point *pp)
char *buf, *tmp;
char offs[32] = "", line[32] = "", file[32] = "";
int ret, len;
+ unsigned long long addr;

buf = zalloc(MAX_CMDLEN);
if (buf == NULL) {
@@ -1266,12 +1301,16 @@ static char *synthesize_perf_probe_point(struct perf_probe_point *pp)
goto error;
}

- if (pp->function)
+ if (pp->function) {
ret = e_snprintf(buf, MAX_CMDLEN, "%s%s%s%s%s", pp->function,
offs, pp->retprobe ? "%return" : "", line,
file);
- else
+ } else if (pp->note) {
+ addr = get_sdt_note_addr(pp->note);
+ ret = e_snprintf(buf, MAX_CMDLEN, "0x%llx", addr);
+ } else {
ret = e_snprintf(buf, MAX_CMDLEN, "%s%s", file, line);
+ }
if (ret <= 0)
goto error;

@@ -1909,6 +1948,26 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
return ret;
}

+static int try_to_find_sdt_notes(struct perf_probe_event *pev,
+ const char *target)
+{
+ struct sdt_note *note = pev->point.note;
+ int ret;
+
+ ret = search_sdt_note(note, target);
+ if (!ret) {
+ if (note->bit32 && !note->addr.a32[0])
+ goto out_err;
+ else if (!note->bit32 && !note->addr.a64[0])
+ goto out_err;
+ }
+ return ret;
+
+out_err:
+ pr_debug("%s : SDT note location not set!", target);
+ return -ENOENT;
+}
+
static int convert_to_probe_trace_events(struct perf_probe_event *pev,
struct probe_trace_event **tevs,
int max_tevs, const char *target)
@@ -1916,11 +1975,23 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
struct symbol *sym;
int ret = 0, i;
struct probe_trace_event *tev;
+ char *buf;
+ unsigned long long addr;

- /* Convert perf_probe_event with debuginfo */
- ret = try_to_find_probe_trace_events(pev, tevs, max_tevs, target);
- if (ret != 0)
- return ret; /* Found in debuginfo or got an error */
+ if (pev->sdt) {
+ ret = -EBADF;
+ if (pev->uprobes)
+ ret = try_to_find_sdt_notes(pev, target);
+ if (ret)
+ return ret;
+ } else {
+ /* Convert perf_probe_event with debuginfo */
+ ret = try_to_find_probe_trace_events(pev, tevs, max_tevs,
+ target);
+ /* Found in debuginfo or got an error */
+ if (ret != 0)
+ return ret;
+ }

/* Allocate trace event buffer */
tev = *tevs = zalloc(sizeof(struct probe_trace_event));
@@ -1928,10 +1999,21 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
return -ENOMEM;

/* Copy parameters */
- tev->point.symbol = strdup(pev->point.function);
- if (tev->point.symbol == NULL) {
- ret = -ENOMEM;
- goto error;
+ if (pev->sdt) {
+ buf = (char *)zalloc(sizeof(char) * MAX_CMDLEN);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto error;
+ }
+ addr = get_sdt_note_addr(pev->point.note);
+ sprintf(buf, "0x%llx", addr);
+ tev->point.symbol = buf;
+ } else {
+ tev->point.symbol = strdup(pev->point.function);
+ if (tev->point.symbol == NULL) {
+ ret = -ENOMEM;
+ goto error;
+ }
}

if (target) {
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 32de5a3..beb1b4a 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -47,6 +47,7 @@ struct perf_probe_point {
bool retprobe; /* Return probe flag */
char *lazy_line; /* Lazy matching pattern */
unsigned long offset; /* Offset from function entry */
+ struct sdt_note *note;
};

/* Perf probe probing argument field chain */
@@ -72,6 +73,7 @@ struct perf_probe_event {
struct perf_probe_point point; /* Probe point */
int nargs; /* Number of arguments */
bool uprobes;
+ bool sdt;
struct perf_probe_arg *args; /* Arguments */
};

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index a065b04..881636a 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1838,6 +1838,90 @@ void cleanup_sdt_note_list(struct list_head *sdt_notes)
}
}

+static void adjust_note_addr(struct sdt_note *tmp, struct sdt_note *key,
+ Elf *elf)
+{
+ GElf_Ehdr ehdr;
+ GElf_Addr base_off = 0;
+ GElf_Shdr shdr;
+
+ if (!gelf_getehdr(elf, &ehdr)) {
+ pr_debug("%s : cannot get elf header.\n", __func__);
+ return;
+ }
+
+ /*
+ * Find out the .stapsdt.base section.
+ * This scn will help us to handle prelinking (if present).
+ * Compare the retrieved file offset of the base section with the
+ * base address in the description of the SDT note. If its different,
+ * then accordingly, adjust the note location.
+ */
+ if (elf_section_by_name(elf, &ehdr, &shdr, SDT_BASE_SCN, NULL))
+ base_off = shdr.sh_offset;
+ if (base_off) {
+ if (tmp->bit32)
+ key->addr.a32[0] = tmp->addr.a32[0] + base_off -
+ tmp->addr.a32[1];
+ else
+ key->addr.a64[0] = tmp->addr.a64[0] + base_off -
+ tmp->addr.a64[1];
+ }
+ key->bit32 = tmp->bit32;
+}
+
+int search_sdt_note(struct sdt_note *key, const char *target)
+{
+ Elf *elf;
+ int fd, ret;
+ bool found = false;
+ struct sdt_note *pos = NULL;
+ LIST_HEAD(sdt_notes);
+
+ fd = open(target, O_RDONLY);
+ if (fd < 0) {
+ pr_err("%s : %s\n", target, strerror(errno));
+ return -errno;
+ }
+
+ symbol__elf_init();
+ elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+ if (!elf) {
+ ret = -EBADF;
+ pr_debug("Can't read the elf of %s\n", target);
+ goto out_close;
+ }
+
+ ret = construct_sdt_notes_list(elf, &sdt_notes);
+ if (ret)
+ goto out_end;
+
+ /* Iterate through the notes and retrieve the required note */
+ list_for_each_entry(pos, &sdt_notes, note_list) {
+ if (!strcmp(key->name, pos->name) &&
+ !strcmp(key->provider, pos->provider)) {
+ adjust_note_addr(pos, key, elf);
+ found = true;
+ break;
+ }
+ }
+ if (!found) {
+ printf("%%%s:%s not found in %s!\n", key->provider, key->name,
+ target);
+ return -ENOENT;
+ }
+
+out_end:
+ elf_end(elf);
+out_close:
+ close(fd);
+ ret = sdt_err(ret, target);
+ if (!ret && list_empty(&sdt_notes))
+ ret = -ENOENT;
+ cleanup_sdt_note_list(&sdt_notes);
+ return ret;
+}
+
void symbol__elf_init(void)
{
elf_version(EV_CURRENT);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 1e50300..3c32dd9 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -287,9 +287,14 @@ int compare_proc_modules(const char *from, const char *to);
/* Specific to SDT notes */
int get_sdt_note_list(struct list_head *head, const char *target);
void cleanup_sdt_note_list(struct list_head *sdt_notes);
+int search_sdt_note(struct sdt_note *key, const char *target);

#define SDT_NOTE_TYPE 3
#define SDT_NOTE_SCN ".note.stapsdt"
#define SDT_NOTE_NAME "stapsdt"
+#define SDT_BASE_SCN ".stapsdt.base"
+#define get_sdt_note_addr(pnote) \
+ ((pnote)->bit32 ? (pnote)->addr.a32[0] : \
+ (pnote)->addr.a64[0])

#endif /* __PERF_SYMBOL */

2013-10-23 05:05:58

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v4 3/3] Documentation regarding perf/sdt

This patch adds documentation for perf support to SDT notes/markers.

Signed-off-by: Hemant Kumar Shaw <[email protected]>
---
tools/perf/Documentation/perf-probe.txt | 17 +++
tools/perf/Documentation/sdt-probes.txt | 184 +++++++++++++++++++++++++++++++
2 files changed, 199 insertions(+), 2 deletions(-)
create mode 100644 tools/perf/Documentation/sdt-probes.txt

diff --git a/tools/perf/Documentation/perf-probe.txt b/tools/perf/Documentation/perf-probe.txt
index b715cb7..f0169d9 100644
--- a/tools/perf/Documentation/perf-probe.txt
+++ b/tools/perf/Documentation/perf-probe.txt
@@ -99,10 +99,15 @@ OPTIONS
--max-probes::
Set the maximum number of probe points for an event. Default is 128.

+-M::
+--markers::
+ View the SDT markers present in a user space application/library.
+
-x::
--exec=PATH::
Specify path to the executable or shared library file for user
- space tracing. Can also be used with --funcs option.
+ space tracing. Can also be used with --funcs option and must be used
+ with --markers/-M option.

In absence of -m/-x options, perf probe checks if the first argument after
the options is an absolute path name. If its an absolute path, perf probe
@@ -121,11 +126,15 @@ Probe points are defined by following syntax.
3) Define event based on source file with lazy pattern
[EVENT=]SRC;PTN [ARG ...]

+ 4) Define event based on SDT marker
+ [[EVENT=]%PROVIDER:MARKER
+

-'EVENT' specifies the name of new event, if omitted, it will be set the name of the probed function. Currently, event group name is set as 'probe'.
+'EVENT' specifies the name of new event, if omitted, it will be set the name of the probed function. Currently, event group name is set as 'probe' except in case of SDT markers where it is set to provider name.
'FUNC' specifies a probed function name, and it may have one of the following options; '+OFFS' is the offset from function entry address in bytes, ':RLN' is the relative-line number from function entry line, and '%return' means that it probes function return. And ';PTN' means lazy matching pattern (see LAZY MATCHING). Note that ';PTN' must be the end of the probe point definition. In addition, '@SRC' specifies a source file which has that function.
It is also possible to specify a probe point by the source line number or lazy matching by using 'SRC:ALN' or 'SRC;PTN' syntax, where 'SRC' is the source file path, ':ALN' is the line number and ';PTN' is the lazy matching pattern.
'ARG' specifies the arguments of this probe point, (see PROBE ARGUMENT).
+'%PROVIDER:MARKER' is the syntax of SDT markers present in an ELF.

PROBE ARGUMENT
--------------
@@ -200,6 +209,10 @@ Add probes at malloc() function on libc

./perf probe -x /lib/libc.so.6 malloc or ./perf probe /lib/libc.so.6 malloc

+Add probes at longjmp SDT marker on libc
+
+ ./perf probe -x /lib64/libc.so.6 %libc:longjmp
+
SEE ALSO
--------
linkperf:perf-trace[1], linkperf:perf-record[1]
diff --git a/tools/perf/Documentation/sdt-probes.txt b/tools/perf/Documentation/sdt-probes.txt
new file mode 100644
index 0000000..d5556b7
--- /dev/null
+++ b/tools/perf/Documentation/sdt-probes.txt
@@ -0,0 +1,184 @@
+Perf probing on SDT markers:
+
+Goal:
+Probe dtrace style markers(SDT) present in user space applications.
+
+Scope:
+Put probe points at SDT markers in user space applications and libraries
+and also probe them using perf.
+
+Why supprt SDT markers? :
+We have lots of applications which use SDT markers today like:
+Postgresql, MySql, Mozilla, Perl, Python, Java, Ruby, libvirt, QEMU, glib
+
+These markers are placed at important places by the developers. Now, these
+markers have a negligible overhead when not enabled. We can enable them
+and probe at these places and find some important information like the
+arguments' values, etc.
+
+How to add SDT markers into user applications:
+We need to have this header sys/sdt.h present.
+sys/sdt.h used is version 3.
+If not present, install systemtap-sdt-devel package.
+
+A very simple example:
+
+$ cat user_app.c
+
+#include <sys/sdt.h>
+
+void main () {
+ /* ... */
+ /*
+ * user_app is the provider name
+ * test_probe is the marker name
+ */
+ STAP_PROBE(user_app, test_mark);
+ /* ... */
+}
+
+$ gcc user_app.c
+$ perf probe -M -x ./a.out
+%user_app:test_mark
+
+A different example to show the same:
+- Create a file with .d extension and mention the probe names in it with
+provider name and marker name.
+
+$ cat probes.d
+provider user_app {
+ probe foo_start();
+ probe fun_start();
+};
+
+- Now create the probes.h and probes.o file :
+$ dtrace -C -h -s probes.d -o probes.h
+$ dtrace -C -G -s probes.d -o probes.o
+
+- A program using the markers:
+
+$ cat user_app.c
+
+#include <stdio.h>
+#include "probes.h"
+
+void foo(void)
+{
+ USER_APP_FOO_START();
+ printf("This is foo\n");
+}
+
+void fun(void)
+{
+ USER_APP_FUN_START();
+ printf("Inside fun\n");
+}
+int main(void)
+{
+ printf("In main\n");
+ foo();
+ fun();
+ return 0;
+}
+
+- Compile it and also provide probes.o file to linker:
+$ gcc user_app.c probes.o -o user_app
+
+- Now use perf to list the markers in the app:
+# perf probe --markers -x ./user_app
+
+%user_app:foo_start
+%user_app:fun_start
+
+- And then use perf probe to add a probe point :
+
+# perf probe -x ./user_app 'my_event=%user_app:foo_start'
+
+Added new event :
+user_app:my_event (on 0x530)
+
+You can now use it on all perf tools such as :
+
+ perf record -e user_app:my_event -aR sleep 1
+
+# perf record -e probe_user:my_event -aR ./user_app
+In main
+This is foo
+Inside fun
+[ perf record: Woken up 1 times to write data ]
+[ perf record: Captured and wrote 0.235 MB perf.data (~10279 samples) ]
+
+- Then use perf tools to analyze it.
+# perf report --stdio
+
+# ========
+# captured on: Tue Sep 3 16:19:55 2013
+# hostname : hemant-fedora
+# os release : 3.11.0-rc3+
+# perf version : 3.9.4-200.fc18.x86_64
+# arch : x86_64
+# nrcpus online : 2
+# nrcpus avail : 2
+# cpudesc : QEMU Virtual CPU version 1.2.2
+# cpuid : GenuineIntel,6,2,3
+# total memory : 2051912 kBIf these are not enabled, they are present in the ELF as nop.
+
+# cmdline : /usr/bin/perf record -e probe_user:foo_start -aR ./user_app
+# event : name = probe_user:foo_start, type = 2, config = 0x38e, config1
+= 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
+excl_guest = 1, precise_ip = 0
+# HEADER_CPU_TOPOLOGY info available, use -I to display
+# HEADER_NUMA_TOPOLOGY info available, use -I to display
+# pmu mappings: software = 1, tracepoint = 2, breakpoint = 5
+# ========
+#
+# Samples: 1 of event 'probe_user:foo_start'
+# Event count (approx.): 1
+#
+# Overhead Command Shared Object Symbol
+# ........ ........ ............. .......
+#
+ 100.00% user_app user_app [.] foo
+
+
+#
+# (For a higher level overview, try: perf report --sort comm,dso)
+#
+
+
+We can see the existing markers in libc (if present) :
+$ perf probe --markers -x /lib64/libc.so.6
+
+%libc:setjmp
+%libc:longjmp
+%libc:longjmp_target
+%libc:lll_futex_wake
+%libc:lll_lock_wait_private
+%libc:longjmp
+%libc:longjmp_target
+%libc:lll_futex_wake
+
+- And then use perf to probe into any marker:
+
+# perf probe -x /lib64/libc.so.6 %libc:setjmp
+Added new event:
+ libc:setjmp (on 0x35981)
+
+You can now use it in all perf tools, such as:
+
+ perf record -e libc:setjmp -aR sleep 1
+
+
+This link shows an example of marker probing with Systemtap:
+https://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps
+
+And, this link shows more info on SDT markers:
+http://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation
+
+- Markers in binaries :
+These SDT markers are present in the ELF in the section named
+".note.stapsdt".
+Here, the name of the marker, its provider, type, location, base
+address, semaphore address, arguments are present.
+We can retrieve these values using the members name_off and desc_off in
+Nhdr structure. If these are not enabled, they are present in the ELF as nop.

Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/23 14:05), Hemant Kumar wrote:
> This allows perf to probe into the sdt markers/notes present in
> the libraries and executables. We try to find the associated location
> and handle prelinking (since, stapsdt notes section is not allocated
> during runtime). Prelinking is handled with the help of base
> section which is allocated during runtime. This address can be compared
> with the address retrieved from the notes' description. If its different,
> we can take this difference and then add to the note's location.
>
> We can use existing '-a/--add' option to add events for sdt markers.
> Also, we can add multiple events at once using the same '-a' option.
>
> Usage:
> perf probe -x /lib64/libc.so.6 -a 'my_event=%libc:setjmp'
>
> Output:
> Added new event:
> libc:my_event (on 0x35981)
>
> You can now use it in all perf tools, such as:
>
> perf record -e libc:my_event -aR sleep 1
>
>
> Signed-off-by: Hemant Kumar Shaw <[email protected]>

Almost! please check below comments :)


> static int convert_to_probe_trace_events(struct perf_probe_event *pev,
> struct probe_trace_event **tevs,
> int max_tevs, const char *target)
> @@ -1916,11 +1975,23 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
> struct symbol *sym;
> int ret = 0, i;
> struct probe_trace_event *tev;
> + char *buf;
> + unsigned long long addr;
>
> - /* Convert perf_probe_event with debuginfo */
> - ret = try_to_find_probe_trace_events(pev, tevs, max_tevs, target);
> - if (ret != 0)
> - return ret; /* Found in debuginfo or got an error */
> + if (pev->sdt) {
> + ret = -EBADF;
> + if (pev->uprobes)
> + ret = try_to_find_sdt_notes(pev, target);
> + if (ret)
> + return ret;
> + } else {
> + /* Convert perf_probe_event with debuginfo */
> + ret = try_to_find_probe_trace_events(pev, tevs, max_tevs,
> + target);
> + /* Found in debuginfo or got an error */
> + if (ret != 0)
> + return ret;

These "ret != 0" checkers can be merged.

[...]
> +int search_sdt_note(struct sdt_note *key, const char *target)
> +{
> + Elf *elf;
> + int fd, ret;
> + bool found = false;
> + struct sdt_note *pos = NULL;
> + LIST_HEAD(sdt_notes);
> +
> + fd = open(target, O_RDONLY);
> + if (fd < 0) {
> + pr_err("%s : %s\n", target, strerror(errno));
> + return -errno;
> + }
> +
> + symbol__elf_init();
> + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
> + if (!elf) {
> + ret = -EBADF;
> + pr_debug("Can't read the elf of %s\n", target);
> + goto out_close;
> + }
> +
> + ret = construct_sdt_notes_list(elf, &sdt_notes);
> + if (ret)
> + goto out_end;
> +
> + /* Iterate through the notes and retrieve the required note */
> + list_for_each_entry(pos, &sdt_notes, note_list) {
> + if (!strcmp(key->name, pos->name) &&
> + !strcmp(key->provider, pos->provider)) {
> + adjust_note_addr(pos, key, elf);
> + found = true;
> + break;
> + }
> + }
> + if (!found) {
> + printf("%%%s:%s not found in %s!\n", key->provider, key->name,
> + target);
> + return -ENOENT;

Here, you skipped the closing process. maybe ret = -ENOENT is enough here.

> + }
> +
> +out_end:
> + elf_end(elf);
> +out_close:
> + close(fd);
> + ret = sdt_err(ret, target);

It seems the sdt_err is only for the return value of contruct_sdt_notes_list(),
thus it is better to integrate it.

> + if (!ret && list_empty(&sdt_notes))
> + ret = -ENOENT;

I think this can be removed, because it always be false (caught by previous !found).

> + cleanup_sdt_note_list(&sdt_notes);
> + return ret;
> +}

Thank you,



--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-24 10:25:44

by Hemant Kumar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Hi,

On 10/24/2013 11:15 AM, Masami Hiramatsu wrote:
> (2013/10/23 14:05), Hemant Kumar wrote:
>> This allows perf to probe into the sdt markers/notes present in
>> the libraries and executables. We try to find the associated location
>> and handle prelinking (since, stapsdt notes section is not allocated
>> during runtime). Prelinking is handled with the help of base
>> section which is allocated during runtime. This address can be compared
>> with the address retrieved from the notes' description. If its different,
>> we can take this difference and then add to the note's location.
>>
>> We can use existing '-a/--add' option to add events for sdt markers.
>> Also, we can add multiple events at once using the same '-a' option.
>>
>> Usage:
>> perf probe -x /lib64/libc.so.6 -a 'my_event=%libc:setjmp'
>>
>> Output:
>> Added new event:
>> libc:my_event (on 0x35981)
>>
>> You can now use it in all perf tools, such as:
>>
>> perf record -e libc:my_event -aR sleep 1
>>
>>
>> Signed-off-by: Hemant Kumar Shaw <[email protected]>
> Almost! please check below comments :)

:) Thanks again for reviewing the patches.

>
>> static int convert_to_probe_trace_events(struct perf_probe_event *pev,
>> struct probe_trace_event **tevs,
>> int max_tevs, const char *target)
>> @@ -1916,11 +1975,23 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
>> struct symbol *sym;
>> int ret = 0, i;
>> struct probe_trace_event *tev;
>> + char *buf;
>> + unsigned long long addr;
>>
>> - /* Convert perf_probe_event with debuginfo */
>> - ret = try_to_find_probe_trace_events(pev, tevs, max_tevs, target);
>> - if (ret != 0)
>> - return ret; /* Found in debuginfo or got an error */
>> + if (pev->sdt) {
>> + ret = -EBADF;
>> + if (pev->uprobes)
>> + ret = try_to_find_sdt_notes(pev, target);
>> + if (ret)
>> + return ret;
>> + } else {
>> + /* Convert perf_probe_event with debuginfo */
>> + ret = try_to_find_probe_trace_events(pev, tevs, max_tevs,
>> + target);
>> + /* Found in debuginfo or got an error */
>> + if (ret != 0)
>> + return ret;
> These "ret != 0" checkers can be merged.
>
> [...]

Indeed, they can be merged.

>> +int search_sdt_note(struct sdt_note *key, const char *target)
>> +{
>> + Elf *elf;
>> + int fd, ret;
>> + bool found = false;
>> + struct sdt_note *pos = NULL;
>> + LIST_HEAD(sdt_notes);
>> +
>> + fd = open(target, O_RDONLY);
>> + if (fd < 0) {
>> + pr_err("%s : %s\n", target, strerror(errno));
>> + return -errno;
>> + }
>> +
>> + symbol__elf_init();
>> + elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
>> + if (!elf) {
>> + ret = -EBADF;
>> + pr_debug("Can't read the elf of %s\n", target);
>> + goto out_close;
>> + }
>> +
>> + ret = construct_sdt_notes_list(elf, &sdt_notes);
>> + if (ret)
>> + goto out_end;
>> +
>> + /* Iterate through the notes and retrieve the required note */
>> + list_for_each_entry(pos, &sdt_notes, note_list) {
>> + if (!strcmp(key->name, pos->name) &&
>> + !strcmp(key->provider, pos->provider)) {
>> + adjust_note_addr(pos, key, elf);
>> + found = true;
>> + break;
>> + }
>> + }
>> + if (!found) {
>> + printf("%%%s:%s not found in %s!\n", key->provider, key->name,
>> + target);
>> + return -ENOENT;
> Here, you skipped the closing process. maybe ret = -ENOENT is enough here.

Yeah, I have missed this. But I think ret = -ENOENT is not enough. We
shouldn't go to sdt_err() for this case. We can return -ENOENT and print
the error.

>> + }
>> +
>> +out_end:
>> + elf_end(elf);
>> +out_close:
>> + close(fd);
>> + ret = sdt_err(ret, target);
> It seems the sdt_err is only for the return value of contruct_sdt_notes_list(),
> thus it is better to integrate it.

Yeah it'll be better to integrate it.

>
>> + if (!ret && list_empty(&sdt_notes))
>> + ret = -ENOENT;
> I think this can be removed, because it always be false (caught by previous !found).

I wrote this check for 'no markers present' case. Anyways, the result is
the same, so,
I guess we can eliminate this (since, its covered by the 'found' check).

>
>> + cleanup_sdt_note_list(&sdt_notes);
>> + return ret;
>> +}
>

I think it'll be better to move the check for found : if (!found) {}
after cleanup_sdt_notes().

--
Thanks
Hemant Kumar

2013-10-25 12:38:19

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Hello Hemant,

On Wed, Oct 23, 2013 at 7:05 AM, Hemant Kumar <[email protected]> wrote:
> This allows perf to probe into the sdt markers/notes present in
> the libraries and executables. We try to find the associated location
> and handle prelinking (since, stapsdt notes section is not allocated
> during runtime). Prelinking is handled with the help of base
> section which is allocated during runtime. This address can be compared
> with the address retrieved from the notes' description. If its different,
> we can take this difference and then add to the note's location.
>
> We can use existing '-a/--add' option to add events for sdt markers.
> Also, we can add multiple events at once using the same '-a' option.
>
> Usage:
> perf probe -x /lib64/libc.so.6 -a 'my_event=%libc:setjmp'
>
> Output:
> Added new event:
> libc:my_event (on 0x35981)
>
> You can now use it in all perf tools, such as:
>
> perf record -e libc:my_event -aR sleep 1

Is there a technical reason why 'perf list' could not show all the
available SDT markers on a system and that the 'market to event'
mapping cannot happen automatically?

So instead of doing all the command line magic above I'd do:

perf list

libc:setjmp [SDT marker]

and I could just do

perf record -e libc:setjmp -AR sleep 1

?

Pekka

2013-10-25 12:59:38

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Hi Pekka,

> >
> > You can now use it in all perf tools, such as:
> >
> > perf record -e libc:my_event -aR sleep 1
>
> Is there a technical reason why 'perf list' could not show all the
> available SDT markers on a system and that the 'market to event'
> mapping cannot happen automatically?
>

Technically feasible. But then we would have to parse each of the
libraries and executables to list them. Right? I am not sure if such a
delay is acceptable.

Also if a binary exists in a path thats is not covered in the default
search, an user might believe that his binary may not have markers.
I know the above reason is more of a user folly than a tooling issue.

> So instead of doing all the command line magic above I'd do:
>
> perf list
>
> libc:setjmp [SDT marker]
>
> and I could just do
>
> perf record -e libc:setjmp -AR sleep 1
>
> ?

--
Thanks and Regards
Srikar Dronamraju

2013-10-25 14:21:01

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

> Technically feasible. But then we would have to parse each of the
> libraries and executables to list them. Right? I am not sure if such a
> delay is acceptable.

You could do it at 'perf list' time or even build time and cache it. And add lazy discovery to 'perf record' and friends.

> Also if a binary exists in a path thats is not covered in the default
> search, an user might believe that his binary may not have markers.
> I know the above reason is more of a user folly than a tooling issue.

Lazy discovery at 'perf record'-time from executable and DSOs should make that transparent to the user, no? I'm pretty sure it will be fast enough with content-adressed cache.

- Pekka-

2013-10-25 15:20:21

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/25/13 8:20 AM, Pekka Enberg wrote:
>> Technically feasible. But then we would have to parse each of the
>> libraries and executables to list them. Right? I am not sure if such a
>> delay is acceptable.
>
> You could do it at 'perf list' time or even build time and cache it. And add lazy discovery to 'perf record' and friends.

Instead searching all the known files or building a cache, how about
just having an option like: perf list <DSO>. perf-record could still do
the probe magic behind the scenes.

David

2013-10-26 09:50:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:


* Srikar Dronamraju <[email protected]> wrote:

> Hi Pekka,
>
> > >
> > > You can now use it in all perf tools, such as:
> > >
> > > perf record -e libc:my_event -aR sleep 1
> >
> > Is there a technical reason why 'perf list' could not show all the
> > available SDT markers on a system and that the 'market to event'
> > mapping cannot happen automatically?
> >
>
> Technically feasible. But then we would have to parse each of the
> libraries and executables to list them. Right? I am not sure if
> such a delay is acceptable.

I'd say lets try Pekka's suggestion and make it more palatable if
there's complaints about the delay. (SSD systems are becoming
dominant and there the search should be reasonably fast.)

We could also make 'perf list' more sophisticated, if invoked
naively as 'perf list' then maybe it should first display the
various event categories, with a (rough) count:

$ perf list
34 hardware events # use 'perf list --hw' to list them
40 hw-cache events # use 'perf list --cache' to list them
20 software events # use 'perf list --sw' to list them
2 raw events # use 'perf list --raw' to list them
120 tracepoints # use 'perf list --tp' to list them
>10 SDT tracepoints # use 'perf list --sdt' to list them

# use 'perf list -a' to list all events
# use 'perf list ./binary' to list events in a given binary

I.e. bring a bit more structure into it.

> Also if a binary exists in a path thats is not covered in the
> default search, an user might believe that his binary may not have
> markers. I know the above reason is more of a user folly than a
> tooling issue.

I think in 99% of the usecases people will either use pre-built
markers that come with their distro, or will be intimately aware of
the markers because they are in the very app they are developing.

So I wouldn't worry about 'user has a weird binary' case too much.

I agree with Pekka that making them easily discoverable and visible
as a coherent whole is really important.

Thanks,

Ingo

2013-10-26 11:16:42

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Pekka Enberg <[email protected]> writes:

> Is there a technical reason why 'perf list' could not show all the
> available SDT markers on a system and that the 'mark to event'
> mapping cannot happen automatically? [...]

A quick experiment with:

find `echo $PATH | tr : ' '` -type f -perm -555 |
xargs readelf -n 2>/dev/null |
grep STAP 2>/dev/null

suggests reasonable performance for my F19 workstation (a second or
two over ~6000 executables), once all the ELF content is in the block
cache. According to a stap eventcount.stp run, that required about
50000 syscall.read events.

Note that a $PATH search excludes shared libraries, which can also
carry <sys/sdt.h> markers. Adding /usr/lib* in more than doubles the
work, then there's /usr/libexec etc.

- FChE

2013-10-28 08:40:43

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/26/2013 02:16 PM, Frank Ch. Eigler wrote:
> Pekka Enberg <[email protected]> writes:
>
>> Is there a technical reason why 'perf list' could not show all the
>> available SDT markers on a system and that the 'mark to event'
>> mapping cannot happen automatically? [...]
> A quick experiment with:
>
> find `echo $PATH | tr : ' '` -type f -perm -555 |
> xargs readelf -n 2>/dev/null |
> grep STAP 2>/dev/null
>
> suggests reasonable performance for my F19 workstation (a second or
> two over ~6000 executables), once all the ELF content is in the block
> cache. According to a stap eventcount.stp run, that required about
> 50000 syscall.read events.
>
> Note that a $PATH search excludes shared libraries, which can also
> carry <sys/sdt.h> markers. Adding /usr/lib* in more than doubles the
> work, then there's /usr/libexec etc.

Thanks for providing numbers to the discussion. AFAICT, we
might even be able to just scan everything for 'perf list' by
default.

Pekka

2013-10-28 08:48:48

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Hi David,

On 10/25/2013 06:20 PM, David Ahern wrote:
> On 10/25/13 8:20 AM, Pekka Enberg wrote:
>>> Technically feasible. But then we would have to parse each of the
>>> libraries and executables to list them. Right? I am not sure if such a
>>> delay is acceptable.
>>
>> You could do it at 'perf list' time or even build time and cache it.
>> And add lazy discovery to 'perf record' and friends.
>
> Instead searching all the known files or building a cache, how about
> just having an option like: perf list <DSO>. perf-record could still
> do the probe magic behind the scenes.

We probably should also support that. But I don't see why
'perf list' could not tell me about SDT markers in libraries that
are already installed on my system.

The problem I have with all the command line magic is that
while the tracing mechanisms are awesome, they're nearly
impossible to discover even by a power user such as myself
and you almost certainly forget the exact syntax over time.
It's not as if you're tracing all the time.

I wish people remembed how awesome and simple 'perf stat'
and 'perf record' with 'perf report' were compared to oprofile
when the first versions came out. I think much of the nice
perf features are suffering because we're not paying enough
attention how to make them accessible to users.

The proposed SDT marker feature is a good example of that.
I mean, how on earth would I know about the userspace
probes unless I read LKML and know that such a feature
exists? And why would I want to provide mappings for SDT
markers and perf events if I want to trace 'libc:setjmp'?

So I really hope this SDT effort and the ktap effort at least
make some effort in unifying all the nice functionality that's
simple to use and easy to discover. I really, really would
at the end of the day, just 'perf trace' like I 'perf stat' or
'perf record'.

Pekka

2013-10-28 08:56:09

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/26/2013 12:50 PM, Ingo Molnar wrote:
> I think in 99% of the usecases people will either use pre-built
> markers that come with their distro, or will be intimately aware of
> the markers because they are in the very app they are developing.
>
> So I wouldn't worry about 'user has a weird binary' case too much.
>
> I agree with Pekka that making them easily discoverable and visible
> as a coherent whole is really important.

I wouldn't worry about the weird binary case either.

Even a build-time whitelist would help. Just put libc and libjvm
there and you're already covering a lot of interesting cases.
And if you then add a printout:

Use 'perf list --scan' to find more tracepoints on your
system.

you're now effectively covering 100% of the cases.

The trick of making the UI not suck is not to force the user
to think about the different mechanisms like SDT markers,
uprobes, or ktap scripts but to make them as transparent
as possible, provide useful defaults, and actively guide the
user towards learning about more command line options
for the complex cases.

Pekka

2013-10-28 10:34:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:


* Pekka Enberg <[email protected]> wrote:

> On 10/26/2013 02:16 PM, Frank Ch. Eigler wrote:
> >Pekka Enberg <[email protected]> writes:
> >
> >>Is there a technical reason why 'perf list' could not show all the
> >>available SDT markers on a system and that the 'mark to event'
> >>mapping cannot happen automatically? [...]
> >A quick experiment with:
> >
> > find `echo $PATH | tr : ' '` -type f -perm -555 |
> > xargs readelf -n 2>/dev/null |
> > grep STAP 2>/dev/null
> >
> >suggests reasonable performance for my F19 workstation (a second or
> >two over ~6000 executables), once all the ELF content is in the block
> >cache. According to a stap eventcount.stp run, that required about
> >50000 syscall.read events.
> >
> >Note that a $PATH search excludes shared libraries, which can also
> >carry <sys/sdt.h> markers. Adding /usr/lib* in more than doubles the
> >work, then there's /usr/libexec etc.
>
> Thanks for providing numbers to the discussion. AFAICT, we might
> even be able to just scan everything for 'perf list' by default.

That should definitely be better in the long run than any whitelist
(or no list at all).

Thanks,

Ingo

Subject: Re: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/26 18:50), Ingo Molnar wrote:
>
> * Srikar Dronamraju <[email protected]> wrote:
>
>> Hi Pekka,
>>
>>>>
>>>> You can now use it in all perf tools, such as:
>>>>
>>>> perf record -e libc:my_event -aR sleep 1
>>>
>>> Is there a technical reason why 'perf list' could not show all the
>>> available SDT markers on a system and that the 'market to event'
>>> mapping cannot happen automatically?
>>>
>>
>> Technically feasible. But then we would have to parse each of the
>> libraries and executables to list them. Right? I am not sure if
>> such a delay is acceptable.
>
> I'd say lets try Pekka's suggestion and make it more palatable if
> there's complaints about the delay. (SSD systems are becoming
> dominant and there the search should be reasonably fast.)
>
> We could also make 'perf list' more sophisticated, if invoked
> naively as 'perf list' then maybe it should first display the
> various event categories, with a (rough) count:
>
> $ perf list
> 34 hardware events # use 'perf list --hw' to list them
> 40 hw-cache events # use 'perf list --cache' to list them
> 20 software events # use 'perf list --sw' to list them
> 2 raw events # use 'perf list --raw' to list them
> 120 tracepoints # use 'perf list --tp' to list them
> >10 SDT tracepoints # use 'perf list --sdt' to list them
>
> # use 'perf list -a' to list all events
> # use 'perf list ./binary' to list events in a given binary
>
> I.e. bring a bit more structure into it.

Ah, that's nice to me too ;)

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/25 21:38), Pekka Enberg wrote:
> Hello Hemant,
>
> On Wed, Oct 23, 2013 at 7:05 AM, Hemant Kumar <[email protected]> wrote:
>> This allows perf to probe into the sdt markers/notes present in
>> the libraries and executables. We try to find the associated location
>> and handle prelinking (since, stapsdt notes section is not allocated
>> during runtime). Prelinking is handled with the help of base
>> section which is allocated during runtime. This address can be compared
>> with the address retrieved from the notes' description. If its different,
>> we can take this difference and then add to the note's location.
>>
>> We can use existing '-a/--add' option to add events for sdt markers.
>> Also, we can add multiple events at once using the same '-a' option.
>>
>> Usage:
>> perf probe -x /lib64/libc.so.6 -a 'my_event=%libc:setjmp'
>>
>> Output:
>> Added new event:
>> libc:my_event (on 0x35981)
>>
>> You can now use it in all perf tools, such as:
>>
>> perf record -e libc:my_event -aR sleep 1
>
> Is there a technical reason why 'perf list' could not show all the
> available SDT markers on a system and that the 'market to event'
> mapping cannot happen automatically?

By the way, what happens if multiple binaries has same SDT marker?
Yeah, perf list shows just one and ignores others. However, if we
probe one, and run binary which use the other one, user will never
see the marker.

So, it still needs a concrete binary path to list or, we should
support a syntax which specify actual binary, as like as below.

perf probe 'my_event=%libc:setjmp@/lib64/libc.so.6'

And perf list may show the marker as in same syntax (for copy&paste).

# perf list --sdt
%libc:setjmp@/lib64/libc.so.6
...

Note that we need '%' to separate namespace :(, since user can define
any marker(provider) name in their binary...

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-28 12:42:35

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/28/13 1:23 PM, Masami Hiramatsu wrote:
> By the way, what happens if multiple binaries has same SDT marker?
> Yeah, perf list shows just one and ignores others. However, if we
> probe one, and run binary which use the other one, user will never
> see the marker.
>
> So, it still needs a concrete binary path to list or, we should
> support a syntax which specify actual binary, as like as below.
>
> perf probe 'my_event=%libc:setjmp@/lib64/libc.so.6'
>
> And perf list may show the marker as in same syntax (for copy&paste).
>
> # perf list --sdt
> %libc:setjmp@/lib64/libc.so.6
> ...
>
> Note that we need '%' to separate namespace :(, since user can define
> any marker(provider) name in their binary...

Sure, you need to support that sort of 'fully qualified name' for
duplicate symbols but the default 'libc:setjmp' should still point
to system libc.

This is an example where tracing libc that's relevant to most of
the users should take priority over the 'duplicate marker in
obscure executable' corner case case.

Pekka

2013-10-28 14:21:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/28/2013 04:11 PM, Srikar Dronamraju wrote:
>
> But what if a system has both 32 bit libc and 64 bit libc?
> Wont we could end up with 2 libc:setjmp?
> Should we give some more intelligence into perf to choose the 64 bit
> libc over 32 bit one?

You can just trace both of them by default, no?

Pekka

2013-10-28 14:26:44

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

* Pekka Enberg <[email protected]> [2013-10-28 14:42:13]:

> >So, it still needs a concrete binary path to list or, we should
> >support a syntax which specify actual binary, as like as below.
> >
> > perf probe 'my_event=%libc:setjmp@/lib64/libc.so.6'
> >
> >And perf list may show the marker as in same syntax (for copy&paste).
> >
> ># perf list --sdt
> > %libc:setjmp@/lib64/libc.so.6
> > ...
> >
> >Note that we need '%' to separate namespace :(, since user can define
> >any marker(provider) name in their binary...
>
> Sure, you need to support that sort of 'fully qualified name' for
> duplicate symbols but the default 'libc:setjmp' should still point
> to system libc.

But what if a system has both 32 bit libc and 64 bit libc?
Wont we could end up with 2 libc:setjmp?
Should we give some more intelligence into perf to choose the 64 bit
libc over 32 bit one?

--
Thanks and Regards
Srikar Dronamraju

2013-10-28 16:59:59

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/28/13 2:48 AM, Pekka Enberg wrote:
> So I really hope this SDT effort and the ktap effort at least
> make some effort in unifying all the nice functionality that's
> simple to use and easy to discover. I really, really would
> at the end of the day, just 'perf trace' like I 'perf stat' or
> 'perf record'.

Agree. I see user's eyes glaze over with each command line option, and
we have added aliases to embed some of the details as well as having
sensible defaults.

I often use perf-list to lookup an exact event name, and I do not want
to see it taking many seconds to minutes to run (not everyone is running
on an SSD). I also run perf on many different OS versions with an NFS
home directory, and do not want to see a cache explosion (I have buildid
disabled for this reason).

David

2013-10-28 17:32:06

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

> >
> >But what if a system has both 32 bit libc and 64 bit libc?
> >Wont we could end up with 2 libc:setjmp?
> >Should we give some more intelligence into perf to choose the 64 bit
> >libc over 32 bit one?
>
> You can just trace both of them by default, no?
>

There has to be a one to one association with the event name and its
mapping. Every event name will finally map to a unique inode and an
offset.

One option would be for perf to look at these markers and have a
different event name for similar markers in different executables.

--
Thanks and Regards
Srikar Dronamraju

2013-10-28 17:48:27

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/28/13 7:31 PM, Srikar Dronamraju wrote:
>>> But what if a system has both 32 bit libc and 64 bit libc?
>>> Wont we could end up with 2 libc:setjmp?
>>> Should we give some more intelligence into perf to choose the 64 bit
>>> libc over 32 bit one?
>> You can just trace both of them by default, no?
>>
> There has to be a one to one association with the event name and its
> mapping. Every event name will finally map to a unique inode and an
> offset.
>
> One option would be for perf to look at these markers and have a
> different event name for similar markers in different executables.

I think we are talking past each other here.

Yes, I understand that you need an fully qualified name
for a SDT marker but there's absolutely no reason to force
feed that to the user of 'perf trace'.

For the 32-bit and 64-bit libc case, why cannot 'perf list'
by default print out something like:

$ perf list

libc:setjmp [SDT marker group]

and provide a '--fully-qualified' command line option that:

$ perf list --fully-qualified

libc:setjmp => libc32:setjmp, libc64:setjmp [SDT marker group]
libc32:setjmp => libc:setjmp@/lib/libc.so.6 [SDT marker]
libc64:setjmp => libc:setjmp@/lib64/libc.so.6 [SDT marker]

and then teach 'perf trace' to deal with SDT marker groups
where you trace two events, not one?

And again, there's no reason to treat system libraries like
libc the same way as some random binary in $HOME. You
can use the fully qualified name in 'perf list' for things
that are not in /lib or some perf-specific whitelist.

Pekka

2013-10-28 18:45:28

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/28/13 6:59 PM, David Ahern wrote:
> I often use perf-list to lookup an exact event name, and I do not want
> to see it taking many seconds to minutes to run (not everyone is
> running on an SSD). I also run perf on many different OS versions with
> an NFS home directory, and do not want to see a cache explosion (I
> have buildid disabled for this reason).

I am talking about reasonable defaults - the 'default' part implies that
people can change the behavior. So we absolutely should also have
something like this for power users such as yourself:

perf config sdt.scan false

That said, the 'reasonable' part suggests that 'perf list' must not take
seconds or minutes (!) for every run. I'd start with implementing a
naive scan and seeing where it takes us. It's not like it's rocket
science to ignore network mounts or revert to a whitelist of paths if
necessary.

As for cache explosion, I don't see what the problem is.

If you build a cache of DSOs and executables that have SDT makers (with a
SHA1 hash), the cache size bound by SDT marker annotated files. You
probably can then unconditionally scan the cached filenames for SDT
markers for 'perf list'. And once you see a SHA1 mismatch, you either
rescan automatically or explain to the user that:

SDT marker cache needs to be updated. Please run 'perf list --scan'.

Transparently supporting SDT markers as events for 'perf trace -e' and
others is slightly more tricky because you probably don't want to scan
the files for every 'perf trace' invocation. However, you can probably
get really far with a 1024-entry SDT marker cache that's separate from
the 'executables and DSOs with SDT markers' cache. So whenever the user
does something like

perf trace -e libc:setjmp sleep 1

The 'libc:setjmp' ends up in the 1024-entry cache (or whatever makes
most sense) that points directly to SDT marker so we can hook into it
quickly. Using simple LRU eviction policy, you end up pushing out the
uninteresting SDT markers and keeping the ones that are used all the
time.

Pekka

Subject: Re: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/29 2:31), Srikar Dronamraju wrote:
>>>
>>> But what if a system has both 32 bit libc and 64 bit libc?
>>> Wont we could end up with 2 libc:setjmp?
>>> Should we give some more intelligence into perf to choose the 64 bit
>>> libc over 32 bit one?
>>
>> You can just trace both of them by default, no?
>>
>
> There has to be a one to one association with the event name and its
> mapping. Every event name will finally map to a unique inode and an
> offset.
>
> One option would be for perf to look at these markers and have a
> different event name for similar markers in different executables.

Or, another idea is introducing hidden event group which
automatically merges those similar markers and shows
as one event alias name. :)

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/29 2:48), Pekka Enberg wrote:
> On 10/28/13 7:31 PM, Srikar Dronamraju wrote:
>>>> But what if a system has both 32 bit libc and 64 bit libc?
>>>> Wont we could end up with 2 libc:setjmp?
>>>> Should we give some more intelligence into perf to choose the 64 bit
>>>> libc over 32 bit one?
>>> You can just trace both of them by default, no?
>>>
>> There has to be a one to one association with the event name and its
>> mapping. Every event name will finally map to a unique inode and an
>> offset.
>>
>> One option would be for perf to look at these markers and have a
>> different event name for similar markers in different executables.
>
> I think we are talking past each other here.
>
> Yes, I understand that you need an fully qualified name
> for a SDT marker but there's absolutely no reason to force
> feed that to the user of 'perf trace'.
>
> For the 32-bit and 64-bit libc case, why cannot 'perf list'
> by default print out something like:
>
> $ perf list
>
> libc:setjmp [SDT marker group]
>
> and provide a '--fully-qualified' command line option that:
>
> $ perf list --fully-qualified
>
> libc:setjmp => libc32:setjmp, libc64:setjmp [SDT marker group]
> libc32:setjmp => libc:setjmp@/lib/libc.so.6 [SDT marker]
> libc64:setjmp => libc:setjmp@/lib64/libc.so.6 [SDT marker]
>
> and then teach 'perf trace' to deal with SDT marker groups
> where you trace two events, not one?

Ah, that's a good idea. :)
And it also is needed for another probe event because
sometimes inlined functions have multiple instances.
I'd like to fold them as one event group.

Thank you!
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-29 05:31:14

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Hi Masami,

On Tue, 29 Oct 2013 12:19:37 +0900, Masami Hiramatsu wrote:
> (2013/10/29 2:48), Pekka Enberg wrote:
>> For the 32-bit and 64-bit libc case, why cannot 'perf list'
>> by default print out something like:
>>
>> $ perf list
>>
>> libc:setjmp [SDT marker group]
>>
>> and provide a '--fully-qualified' command line option that:
>>
>> $ perf list --fully-qualified
>>
>> libc:setjmp => libc32:setjmp, libc64:setjmp [SDT marker group]
>> libc32:setjmp => libc:setjmp@/lib/libc.so.6 [SDT marker]
>> libc64:setjmp => libc:setjmp@/lib64/libc.so.6 [SDT marker]
>>
>> and then teach 'perf trace' to deal with SDT marker groups
>> where you trace two events, not one?
>
> Ah, that's a good idea. :)
> And it also is needed for another probe event because
> sometimes inlined functions have multiple instances.
> I'd like to fold them as one event group.

Yes, I'd love to see it as well. :)

Thanks,
Namhyung

2013-10-29 05:50:11

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

Hi Ingo,

On Sat, 26 Oct 2013 11:50:23 +0200, Ingo Molnar wrote:
> * Srikar Dronamraju <[email protected]> wrote:
>
>> Hi Pekka,
>>
>> > >
>> > > You can now use it in all perf tools, such as:
>> > >
>> > > perf record -e libc:my_event -aR sleep 1
>> >
>> > Is there a technical reason why 'perf list' could not show all the
>> > available SDT markers on a system and that the 'market to event'
>> > mapping cannot happen automatically?
>> >
>>
>> Technically feasible. But then we would have to parse each of the
>> libraries and executables to list them. Right? I am not sure if
>> such a delay is acceptable.
>
> I'd say lets try Pekka's suggestion and make it more palatable if
> there's complaints about the delay. (SSD systems are becoming
> dominant and there the search should be reasonably fast.)
>
> We could also make 'perf list' more sophisticated, if invoked
> naively as 'perf list' then maybe it should first display the
> various event categories, with a (rough) count:
>
> $ perf list
> 34 hardware events # use 'perf list --hw' to list them
> 40 hw-cache events # use 'perf list --cache' to list them
> 20 software events # use 'perf list --sw' to list them
> 2 raw events # use 'perf list --raw' to list them
> 120 tracepoints # use 'perf list --tp' to list them
> >10 SDT tracepoints # use 'perf list --sdt' to list them
>
> # use 'perf list -a' to list all events
> # use 'perf list ./binary' to list events in a given binary
>
> I.e. bring a bit more structure into it.

I like this. :)

Note that 'perf list' already support this kind of filtering now:

$ perf list hw cache sw tracepoint pmu

or

$ perf list sched:*

It'd be great if this globbing also supports SDTs.

And for 'perf list ./binary' case, it could detect libraries in the
dependency list and then also scan them.

>
>> Also if a binary exists in a path thats is not covered in the
>> default search, an user might believe that his binary may not have
>> markers. I know the above reason is more of a user folly than a
>> tooling issue.
>
> I think in 99% of the usecases people will either use pre-built
> markers that come with their distro, or will be intimately aware of
> the markers because they are in the very app they are developing.
>
> So I wouldn't worry about 'user has a weird binary' case too much.
>
> I agree with Pekka that making them easily discoverable and visible
> as a coherent whole is really important.

Agreed. We do need to improve the user experience of the perf tools!

Thanks,
Namhyung

2013-10-29 09:56:10

by Hemant Kumar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/29/2013 12:15 AM, Pekka Enberg wrote:
[...]
> If you build a cache of DSOs and executables that have SDT makers (with a
> SHA1 hash), the cache size bound by SDT marker annotated files. You
> probably can then unconditionally scan the cached filenames for SDT
> markers for 'perf list'. And once you see a SHA1 mismatch, you either
> rescan automatically or explain to the user that:
>
> SDT marker cache needs to be updated. Please run 'perf list --scan'.
>
> Transparently supporting SDT markers as events for 'perf trace -e' and
> others is slightly more tricky because you probably don't want to scan
> the files for every 'perf trace' invocation. However, you can probably
> get really far with a 1024-entry SDT marker cache that's separate from
> the 'executables and DSOs with SDT markers' cache. So whenever the user
> does something like
>
> perf trace -e libc:setjmp sleep 1
>
> The 'libc:setjmp' ends up in the 1024-entry cache (or whatever makes
> most sense) that points directly to SDT marker so we can hook into it
> quickly. Using simple LRU eviction policy, you end up pushing out the
> uninteresting SDT markers and keeping the ones that are used all the
> time.


So, what I understand is that we need to implement it this way (Please do
correct me if I am wrong !!) :

Upon invoking "perf list" / "perf list --sdt" for the first time by the
user, the
executables and dsos (in PATH and /usr/lib*) should be searched for SDT
markers. All these markers along with their one-to-one mapping with the
files can be stored in a "cache" where each entry can be like -
[ sdt_marker : provider : FQN : buildid : location ],
where, "sdt_marker" and "provider" shall be the marker name and provider
names present in the SDT notes' description, FQN shall be the absolute path
of the binary and "location" will be the location of the SDT marker
inside the binary.

Subsequent invocations of "perf list" / "perf list --sdt" shall read
this cache and
display the info. If we need to update the list we can use 'perf list
--scan".

So, now if we use "perf record -e prov:mark -aR sleep 10", it should go
through the list and find out the matched markers and if they have multiple
matches, we probe markers in all of the matched entries. Whenever a match
is found, the FQN can be used to find the binary, match the buildid (we need
to confirm that the binary didn't change since last "perf list" / "perf
list --scan")
and then confirm the presence of marker and the location. And then go on
with the probing and recording.

There shouldn't be any "perf probe " in between.
That surely makes the task of a user a lot easier!

However, there are some issues which are likely to come up while
implementing in
the above way:
1. Where this cache should be? Keeping it in tracing directory inside
the debugfs
directory should seem more feasible. And, shall this cache be shareable?
2. perf record is a performance intensive process, can we allow the
delay due to
this searching process here?
etc.

--
Thanks
Hemant Kumar

2013-10-29 14:05:44

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/29/2013 11:55 AM, Hemant Kumar wrote:
> 1. Where this cache should be? Keeping it in tracing directory inside
> the debugfs
> directory should seem more feasible. And, shall this cache be shareable?

You can't share all of the cache because otherwise you'll expose details
on binaries that not everyone has access to.

It might make sense to split the cache into two parts: system markers
and user markers and share the former.

> 2. perf record is a performance intensive process, can we allow the
> delay due to
> this searching process here?

I think scanning is OK if the user specified a SDT markers but not
otherwise.

Perhaps you can use a bloom filter to quickly check if the user passed a
SDT marker or not.

Pekka

2013-10-29 14:52:42

by Mark Wielaard

[permalink] [raw]
Subject: Re: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On Tue, 2013-10-29 at 12:19 +0900, Masami Hiramatsu wrote:
> (2013/10/29 2:48), Pekka Enberg wrote:
> > For the 32-bit and 64-bit libc case, why cannot 'perf list'
> > by default print out something like:
> >
> > $ perf list
> >
> > libc:setjmp [SDT marker group]
> >
> > and provide a '--fully-qualified' command line option that:
> >
> > $ perf list --fully-qualified
> >
> > libc:setjmp => libc32:setjmp, libc64:setjmp [SDT marker group]
> > libc32:setjmp => libc:setjmp@/lib/libc.so.6 [SDT marker]
> > libc64:setjmp => libc:setjmp@/lib64/libc.so.6 [SDT marker]
> >
> > and then teach 'perf trace' to deal with SDT marker groups
> > where you trace two events, not one?
>
> Ah, that's a good idea. :)
> And it also is needed for another probe event because
> sometimes inlined functions have multiple instances.
> I'd like to fold them as one event group.

A nice user case to think about when designing this interface might be
the java hotspot jvm (libjvm.so). It has SDT markers with the same name
that might occur at multiple addresses depending on code path taken or
compiler optimization. And there are multiple libjvm.so variants
depending on whether the user uses the client or server VM. And users
often have multiple major versions installed (both 1.6 and 1.7 are
currently being shipped by some distros and can be installed in
parallel).

Normally a user that wants to monitor say the hotspot:gc__begin SDT
probe wants to see that probe in whatever code path it happens and in
whatever libjvm.so happens to be running (client or server and 1.6 or
1.7 version). But might still want to be able to specify a specific
variant.

Cheers,

Mark

2013-10-29 19:41:35

by Hemant Kumar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/29/2013 07:35 PM, Pekka Enberg wrote:
> On 10/29/2013 11:55 AM, Hemant Kumar wrote:
>> 1. Where this cache should be? Keeping it in tracing directory inside
>> the debugfs
>> directory should seem more feasible. And, shall this cache be shareable?
>
> You can't share all of the cache because otherwise you'll expose
> details on binaries that not everyone has access to.

Correct, that was one of the reason to be worried. And since, to a
normal user debugfs is not allowed to be entered, then we can keep that
inside debugfs/tracing subdirectory.

>
> It might make sense to split the cache into two parts: system markers
> and user markers and share the former.
>

Ok...

>> 2. perf record is a performance intensive process, can we allow the
>> delay due to
>> this searching process here?
>
> I think scanning is OK if the user specified a SDT markers but not
> otherwise.
>
> Perhaps you can use a bloom filter to quickly check if the user passed
> a SDT marker or not.
>

True, bloom filters may help in this case with a quick check.

--
Thanks
Hemant Kumar

2013-10-29 19:54:44

by Pekka Enberg

[permalink] [raw]
Subject: Re: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On Tue, Oct 29, 2013 at 4:51 PM, Mark Wielaard <[email protected]> wrote:
> A nice user case to think about when designing this interface might be
> the java hotspot jvm (libjvm.so). It has SDT markers with the same name
> that might occur at multiple addresses depending on code path taken or
> compiler optimization. And there are multiple libjvm.so variants
> depending on whether the user uses the client or server VM. And users
> often have multiple major versions installed (both 1.6 and 1.7 are
> currently being shipped by some distros and can be installed in
> parallel).
>
> Normally a user that wants to monitor say the hotspot:gc__begin SDT
> probe wants to see that probe in whatever code path it happens and in
> whatever libjvm.so happens to be running (client or server and 1.6 or
> 1.7 version). But might still want to be able to specify a specific
> variant.

Agreed, it's an excellent use case.

Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/26 20:16), Frank Ch. Eigler wrote:
> Pekka Enberg <[email protected]> writes:
>
>> Is there a technical reason why 'perf list' could not show all the
>> available SDT markers on a system and that the 'mark to event'
>> mapping cannot happen automatically? [...]
>
> A quick experiment with:
>
> find `echo $PATH | tr : ' '` -type f -perm -555 |
> xargs readelf -n 2>/dev/null |
> grep STAP 2>/dev/null
>
> suggests reasonable performance for my F19 workstation (a second or
> two over ~6000 executables), once all the ELF content is in the block
> cache. According to a stap eventcount.stp run, that required about
> 50000 syscall.read events.
>
> Note that a $PATH search excludes shared libraries, which can also
> carry <sys/sdt.h> markers. Adding /usr/lib* in more than doubles the
> work, then there's /usr/libexec etc.

To find all system libraries, we can use ldconfig.

$ ldconfig --print-cache

shows what dynamic libraries will be loaded. On my own laptop (running
ubuntu13.04) shows ~1000 libs.

Thank you,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

(2013/10/29 3:45), Pekka Enberg wrote:
> On 10/28/13 6:59 PM, David Ahern wrote:
> > I often use perf-list to lookup an exact event name, and I do not want
> > to see it taking many seconds to minutes to run (not everyone is
> > running on an SSD). I also run perf on many different OS versions with
> > an NFS home directory, and do not want to see a cache explosion (I
> > have buildid disabled for this reason).
>
> I am talking about reasonable defaults - the 'default' part implies that
> people can change the behavior. So we absolutely should also have
> something like this for power users such as yourself:
>
> perf config sdt.scan false

Ah, I like this perf-config to store the default/customized values ;)

> That said, the 'reasonable' part suggests that 'perf list' must not take
> seconds or minutes (!) for every run. I'd start with implementing a
> naive scan and seeing where it takes us. It's not like it's rocket
> science to ignore network mounts or revert to a whitelist of paths if
> necessary.

I think it is reasonable to scan only $PATH and ld.so.cache (the result
of ldconfig --print-cache) by default. :)

Thank you,



--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-10-30 11:52:04

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/30/13 12:05 PM, Masami Hiramatsu wrote:
> To find all system libraries, we can use ldconfig.
>
> $ ldconfig --print-cache
>
> shows what dynamic libraries will be loaded. On my own laptop (running
> ubuntu13.04) shows ~1000 libs.

Good point. That definitely narrows down the scanned set.

Pekka

2013-10-30 13:30:25

by Hemant Kumar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On 10/30/2013 03:35 PM, Masami Hiramatsu wrote:
> (2013/10/26 20:16), Frank Ch. Eigler wrote:
>> Note that a $PATH search excludes shared libraries, which can also
>> carry <sys/sdt.h> markers. Adding /usr/lib* in more than doubles the
>> work, then there's /usr/libexec etc.
> To find all system libraries, we can use ldconfig.
>
> $ ldconfig --print-cache
>
> shows what dynamic libraries will be loaded. On my own laptop (running
> ubuntu13.04) shows ~1000 libs.
>
> Thank you,
>

Ah! that seems nice... will save a lot of time!

--
Thanks
Hemant Kumar

2013-10-31 09:59:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:


* Pekka Enberg <[email protected]> wrote:

> On 10/30/13 12:05 PM, Masami Hiramatsu wrote:
> >To find all system libraries, we can use ldconfig.
> >
> >$ ldconfig --print-cache
> >
> >shows what dynamic libraries will be loaded. On my own laptop (running
> >ubuntu13.04) shows ~1000 libs.
>
> Good point. That definitely narrows down the scanned set.
>
> Pekka

There's also 'strings /etc/prelink.cache' that should give a good
list of binaries and libraries that matter.

Thanks,

Ingo

2013-10-31 10:55:37

by Mark Wielaard

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On Wed, 2013-10-30 at 13:51 +0200, Pekka Enberg wrote:
> On 10/30/13 12:05 PM, Masami Hiramatsu wrote:
> > To find all system libraries, we can use ldconfig.
> >
> > $ ldconfig --print-cache
> >
> > shows what dynamic libraries will be loaded. On my own laptop (running
> > ubuntu13.04) shows ~1000 libs.
>
> Good point. That definitely narrows down the scanned set.

It is fast. But that would miss the various libjvm.so variants for
example. Or other programs, like libreoffice, which have SDT probes in
their internal shared libraries that aren't in the default ldconfig
paths.

Cheers,

Mark

2013-10-31 10:58:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:


* Mark Wielaard <[email protected]> wrote:

> On Wed, 2013-10-30 at 13:51 +0200, Pekka Enberg wrote:
> > On 10/30/13 12:05 PM, Masami Hiramatsu wrote:
> > > To find all system libraries, we can use ldconfig.
> > >
> > > $ ldconfig --print-cache
> > >
> > > shows what dynamic libraries will be loaded. On my own laptop (running
> > > ubuntu13.04) shows ~1000 libs.
> >
> > Good point. That definitely narrows down the scanned set.
>
> It is fast. But that would miss the various libjvm.so variants for
> example. Or other programs, like libreoffice, which have SDT
> probes in their internal shared libraries that aren't in the
> default ldconfig paths.

I suppose those Java libraries ought to show up in
/etc/prelink.cache though, right?

Thanks,

Ingo

2013-10-31 13:12:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On Thu, Oct 31, 2013 at 11:57:59AM +0100, Ingo Molnar wrote:
> I suppose those Java libraries ought to show up in
> /etc/prelink.cache though, right?

I just checked, none of my systems have prelinking enabled by default;
so its not something you can rely on.

ISTR prelinking defeated ASLR and thus was very short lived.

2013-10-31 13:24:18

by Mark Wielaard

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

On Thu, 2013-10-31 at 11:57 +0100, Ingo Molnar wrote:
> * Mark Wielaard <[email protected]> wrote:
> > On Wed, 2013-10-30 at 13:51 +0200, Pekka Enberg wrote:
> > > On 10/30/13 12:05 PM, Masami Hiramatsu wrote:
> > > > To find all system libraries, we can use ldconfig.
> > > >
> > > > $ ldconfig --print-cache
> > > >
> > > > shows what dynamic libraries will be loaded. On my own laptop (running
> > > > ubuntu13.04) shows ~1000 libs.
> > >
> > > Good point. That definitely narrows down the scanned set.
> >
> > It is fast. But that would miss the various libjvm.so variants for
> > example. Or other programs, like libreoffice, which have SDT
> > probes in their internal shared libraries that aren't in the
> > default ldconfig paths.
>
> I suppose those Java libraries ought to show up in
> /etc/prelink.cache though, right?

Good point. Yes, all executables and libraries I was missing in ldconfig
--print-cache do show up with prelink -p.

Except libjvm.so itself... Apparently prelink is convinced that really
is never used. hmmm. Apparently all wrapper "java" executables only
dlopen it, so it is never directly linked, and prelink doesn't cache it.

But except for that special case, prelink -p is a good substitute.

Cheers,

Mark