2020-09-13 21:04:33

by Jiri Olsa

[permalink] [raw]
Subject: [RFC 00/26] perf: Add mmap3 support

hi,
while playing with perf daemon support I realized I need
the build id data in mmap events, so we don't need to care
about removed/updated binaries during long perf runs.

This RFC patchset adds new mmap3 events that copies mmap2
event and adds build id in it. It makes mmap3 the default
mmap event for synthesizing kernel/modules/tasks and adds
some tooling enhancements to enable the workflow below.

Note that the build id retrieval code is stolen from bpf
code, where it's been used (together with file offsets)
to replace IPs in user space stack traces. It's now added
under lib directory.


On recording server:

- on the recording server we can run record with -B option to
skip build id scan:

# perf record -B
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.462 MB perf.data ]

# find ~/.debug
find: ‘/root/.debug’: No such file or directory

# perf report
...
97.93% swapper [kernel.kallsyms] [k] native_safe_halt
0.18% sshd [kernel.kallsyms] [k] avtab_search_node
0.14% swapper [kernel.kallsyms] [k] __do_softirq
0.05% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.03% swapper [kernel.kallsyms] [k] finish_task_switch

- display used/hit build ids:

# perf buildid-list | head -5
439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 [kernel.kallsyms]
23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd

- store build id binaries into build id cache:

# perf buildid-list --store | head -5
OK 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 [kernel.kallsyms]
OK 23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
OK d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
OK 1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
OK ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd

# find ~/.debug | head -5
/root/.debug
/root/.debug/[kernel.kallsyms]
/root/.debug/[kernel.kallsyms]/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46
/root/.debug/[kernel.kallsyms]/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/kallsyms
/root/.debug/[kernel.kallsyms]/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/probes

- run debuginfod daemon to provide binaries to another server (below)

# debuginfod -F /


On another server:

- copy perf.data from 'record' server and run:

$ find ~/.debug/
find: ‘/home/jolsa/.debug/’: No such file or directory

$ perf buildid-list | head -5
No kallsyms or vmlinux with build-id 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 was found
439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 [kernel.kallsyms]
23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd

- report does not show anything (kernel build id does not match):

$ perf report --stdio
...
97.93% swapper [kernel.kallsyms] [k] 0xffffffffa8b859be
0.14% swapper [kernel.kallsyms] [k] 0xffffffffa8e00074
0.11% sshd [kernel.kallsyms] [k] 0xffffffffa855b283
0.05% swapper [kernel.kallsyms] [k] 0xffffffffa8b85d31
0.03% swapper [kernel.kallsyms] [k] 0xffffffffa810a220

- store does not work, existing binaries have different build ids:

$ perf report --store | head -5
No kallsyms or vmlinux with build-id 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 was found
FAIL 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 [kernel.kallsyms]
FAIL 23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
FAIL d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
FAIL 1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
FAIL ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd

- instruct debuginfo client to download them (modules retrieval does not work yet for some reason):

$ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf report --store | head -5
No kallsyms or vmlinux with build-id 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 was found
OK 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 [kernel.kallsyms]
FAIL 23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
FAIL d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
FAIL 1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
OK ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd

- and report works:

$ perf report --stdio
...
97.93% swapper [kernel.kallsyms] [k] native_safe_halt
0.18% sshd [kernel.kallsyms] [k] avtab_search_node
0.14% swapper [kernel.kallsyms] [k] __do_softirq
0.05% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.03% swapper [kernel.kallsyms] [k] finish_task_switch

- because we have the data in build id cache:

$ find ~/.debug | head -10
.../.debug
.../.debug/home
.../.debug/home/jolsa
.../.debug/home/jolsa/.cache
.../.debug/home/jolsa/.cache/debuginfod_client
.../.debug/home/jolsa/.cache/debuginfod_client/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46
.../.debug/home/jolsa/.cache/debuginfod_client/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/executable
.../.debug/home/jolsa/.cache/debuginfod_client/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/executable/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46
.../.debug/home/jolsa/.cache/debuginfod_client/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/executable/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/elf
.../.debug/home/jolsa/.cache/debuginfod_client/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/executable/439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/probes


The code still needs some polishing, but I'd like to hear some
opinions on the usage workflow, so it could get adjusted early
on ;-)

For example: should we make -B default now? what about users
that expect build id cache populated? And perhaps some .perfconfig
setup possibility for debuginfod server.

Available also in:
git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
perf/mmap3

thanks,
jirka


Cc: Alexei Starovoitov <[email protected]>
---
Jiri Olsa (26):
bpf: Move stack_map_get_build_id into lib
perf: Introduce mmap3 version of mmap event
tools headers uapi: Sync tools/include/uapi/linux/perf_event.h
perf tools: Add filename__decompress function
perf tools: Add build_id__is_defined function
perf tools: Add support to read build id from compressed elf
perf tools: Add check for existing link in buildid dir
perf tools: Use struct extra_kernel_map in machine__process_kernel_mmap_event
perf tools: Try load vmlinux from buildid database
perf tools: Enable mmap3 map event when supported
perf tools: Add mmap3 support
perf tools: Set build id for kernel dso objects
perf tools: Plug in mmap3 event
perf tools: Add mmap3 events to --show-mmap-events option
perf tools: Synthesize proc tasks with mmap3
perf tools: Synthesize modules with mmap3
perf tools: Synthesize kernel with mmap3
perf tests: Add mmap3 support for perf record test
perf tools: Add buildid-list support for mmap3
perf tools: Add build_id_cache__add function
perf tools: Add machine__for_each_dso function
perf tools: Use machine__for_each_dso in perf_session__cache_build_ids
perf tools: Add __perf_session__cache_build_ids function
perf tools: Add buildid-list --store option
perf tools: Move debuginfo download code into get_debuginfo
perf tools: Add report --store option

include/linux/buildid.h | 11 +++++++
include/uapi/linux/perf_event.h | 27 +++++++++++++++-
kernel/bpf/stackmap.c | 143 +++--------------------------------------------------------------------------------
kernel/events/core.c | 38 +++++++++++++++++-----
lib/Makefile | 3 +-
lib/buildid.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/include/uapi/linux/perf_event.h | 27 +++++++++++++++-
tools/lib/perf/include/perf/event.h | 18 +++++++++++
tools/perf/Documentation/perf-buildid-list.txt | 12 +++++++
tools/perf/Documentation/perf-report.txt | 3 ++
tools/perf/builtin-annotate.c | 1 +
tools/perf/builtin-buildid-list.c | 179 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
tools/perf/builtin-c2c.c | 1 +
tools/perf/builtin-diff.c | 1 +
tools/perf/builtin-inject.c | 38 ++++++++++++++++++++++
tools/perf/builtin-kmem.c | 1 +
tools/perf/builtin-mem.c | 1 +
tools/perf/builtin-record.c | 14 +++++++++
tools/perf/builtin-report.c | 19 +++++++++++
tools/perf/builtin-script.c | 34 ++++++++++++++++++++
tools/perf/builtin-trace.c | 1 +
tools/perf/tests/perf-record.c | 7 ++++-
tools/perf/util/build-id.c | 179 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------
tools/perf/util/build-id.h | 8 +++++
tools/perf/util/data-convert-bt.c | 1 +
tools/perf/util/dso.c | 31 +++++++++++-------
tools/perf/util/dso.h | 2 ++
tools/perf/util/event.c | 32 +++++++++++++++++++
tools/perf/util/event.h | 5 +++
tools/perf/util/evsel.c | 9 +++++-
tools/perf/util/evsel.h | 1 +
tools/perf/util/machine.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------
tools/perf/util/machine.h | 6 ++++
tools/perf/util/map.c | 8 +++--
tools/perf/util/map.h | 2 +-
tools/perf/util/mmap.c | 2 +-
tools/perf/util/perf_event_attr_fprintf.c | 1 +
tools/perf/util/probe-event.c | 6 ++--
tools/perf/util/session.c | 28 +++++++++++++++++
tools/perf/util/symbol-elf.c | 37 ++++++++++++++++++++--
tools/perf/util/symbol.c | 14 +++++++++
tools/perf/util/synthetic-events.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++------------------------------
tools/perf/util/tool.h | 1 +
43 files changed, 1059 insertions(+), 316 deletions(-)
create mode 100644 include/linux/buildid.h
create mode 100644 lib/buildid.c


2020-09-13 21:04:45

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Add new version of mmap event. The MMAP3 record is an
augmented version of MMAP2, it adds build id value to
identify the exact binary object behind memory map:

struct {
struct perf_event_header header;

u32 pid, tid;
u64 addr;
u64 len;
u64 pgoff;
u32 maj;
u32 min;
u64 ino;
u64 ino_generation;
u32 prot, flags;
u32 reserved;
u8 buildid[20];
char filename[];
struct sample_id sample_id;
};

Adding 4 bytes reserved field to align buildid data to 8 bytes,
so sample_id data is properly aligned.

The mmap3 event is enabled by new mmap3 bit in perf_event_attr
struct. When set for an event, it enables the build id retrieval
and will use mmap3 format for the event.

Keeping track of mmap3 events and calling build_id_parse
in perf_event_mmap_event only if we have any defined.

Having build id attached directly to the mmap event will help
tool like perf to skip final search through perf data for
binaries that are needed in the report time. Also it prevents
possible race when the binary could be removed or replaced
during profiling.

Signed-off-by: Jiri Olsa <[email protected]>
---
include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
kernel/events/core.c | 38 +++++++++++++++++++++++++++------
2 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 077e7ee69e3d..facfc3c673ed 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -384,7 +384,8 @@ struct perf_event_attr {
aux_output : 1, /* generate AUX records instead of events */
cgroup : 1, /* include cgroup events */
text_poke : 1, /* include text poke events */
- __reserved_1 : 30;
+ mmap3 : 1, /* include bpf events */
+ __reserved_1 : 29;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1060,6 +1061,30 @@ enum perf_event_type {
*/
PERF_RECORD_TEXT_POKE = 20,

+ /*
+ * The MMAP3 records are an augmented version of MMAP2, they add
+ * build id value to identify the exact binary behind map
+ *
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u32 pid, tid;
+ * u64 addr;
+ * u64 len;
+ * u64 pgoff;
+ * u32 maj;
+ * u32 min;
+ * u64 ino;
+ * u64 ino_generation;
+ * u32 prot, flags;
+ * u32 reserved;
+ * u8 buildid[20];
+ * char filename[];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_MMAP3 = 21,
+
PERF_RECORD_MAX, /* non-ABI */
};

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7ed5248f0445..719894492dac 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -51,6 +51,7 @@
#include <linux/proc_ns.h>
#include <linux/mount.h>
#include <linux/min_heap.h>
+#include <linux/buildid.h>

#include "internal.h"

@@ -386,6 +387,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);

static atomic_t nr_mmap_events __read_mostly;
+static atomic_t nr_mmap3_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_namespaces_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
@@ -4588,7 +4590,7 @@ static bool is_sb_event(struct perf_event *event)
return false;

if (attr->mmap || attr->mmap_data || attr->mmap2 ||
- attr->comm || attr->comm_exec ||
+ attr->mmap3 || attr->comm || attr->comm_exec ||
attr->task || attr->ksymbol ||
attr->context_switch || attr->text_poke ||
attr->bpf_event)
@@ -4644,6 +4646,8 @@ static void unaccount_event(struct perf_event *event)
dec = true;
if (event->attr.mmap || event->attr.mmap_data)
atomic_dec(&nr_mmap_events);
+ if (event->attr.mmap3)
+ atomic_dec(&nr_mmap3_events);
if (event->attr.comm)
atomic_dec(&nr_comm_events);
if (event->attr.namespaces)
@@ -7465,7 +7469,7 @@ static void perf_pmu_output_stop(struct perf_event *event)
/*
* task tracking -- fork/exit
*
- * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap_data | attr.task
+ * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap3 | attr.mmap_data | attr.task
*/

struct perf_task_event {
@@ -7486,8 +7490,8 @@ struct perf_task_event {
static int perf_event_task_match(struct perf_event *event)
{
return event->attr.comm || event->attr.mmap ||
- event->attr.mmap2 || event->attr.mmap_data ||
- event->attr.task;
+ event->attr.mmap2 || event->attr.mmap3 ||
+ event->attr.mmap_data || event->attr.task;
}

static void perf_event_task_output(struct perf_event *event,
@@ -7913,6 +7917,7 @@ struct perf_mmap_event {
u64 ino;
u64 ino_generation;
u32 prot, flags;
+ u8 buildid[BUILD_ID_SIZE];

struct {
struct perf_event_header header;
@@ -7933,7 +7938,7 @@ static int perf_event_mmap_match(struct perf_event *event,
int executable = vma->vm_flags & VM_EXEC;

return (!executable && event->attr.mmap_data) ||
- (executable && (event->attr.mmap || event->attr.mmap2));
+ (executable && (event->attr.mmap || event->attr.mmap2 || event->attr.mmap3));
}

static void perf_event_mmap_output(struct perf_event *event,
@@ -7949,7 +7954,7 @@ static void perf_event_mmap_output(struct perf_event *event,
if (!perf_event_mmap_match(event, data))
return;

- if (event->attr.mmap2) {
+ if (event->attr.mmap2 || event->attr.mmap3) {
mmap_event->event_id.header.type = PERF_RECORD_MMAP2;
mmap_event->event_id.header.size += sizeof(mmap_event->maj);
mmap_event->event_id.header.size += sizeof(mmap_event->min);
@@ -7959,6 +7964,12 @@ static void perf_event_mmap_output(struct perf_event *event,
mmap_event->event_id.header.size += sizeof(mmap_event->flags);
}

+ if (event->attr.mmap3) {
+ mmap_event->event_id.header.type = PERF_RECORD_MMAP3;
+ mmap_event->event_id.header.size += sizeof(u32);
+ mmap_event->event_id.header.size += sizeof(mmap_event->buildid);
+ }
+
perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
ret = perf_output_begin(&handle, event,
mmap_event->event_id.header.size);
@@ -7970,7 +7981,7 @@ static void perf_event_mmap_output(struct perf_event *event,

perf_output_put(&handle, mmap_event->event_id);

- if (event->attr.mmap2) {
+ if (event->attr.mmap2 || event->attr.mmap3) {
perf_output_put(&handle, mmap_event->maj);
perf_output_put(&handle, mmap_event->min);
perf_output_put(&handle, mmap_event->ino);
@@ -7979,6 +7990,13 @@ static void perf_event_mmap_output(struct perf_event *event,
perf_output_put(&handle, mmap_event->flags);
}

+ if (event->attr.mmap3) {
+ u32 reserved = 0;
+
+ perf_output_put(&handle, reserved);
+ __output_copy(&handle, mmap_event->buildid, BUILD_ID_SIZE);
+ }
+
__output_copy(&handle, mmap_event->file_name,
mmap_event->file_size);

@@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
mmap_event->prot = prot;
mmap_event->flags = flags;

+ if (atomic_read(&nr_mmap3_events))
+ build_id_parse(vma, mmap_event->buildid);
+
if (!(vma->vm_flags & VM_EXEC))
mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;

@@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
/* .ino_generation (attr_mmap2 only) */
/* .prot (attr_mmap2 only) */
/* .flags (attr_mmap2 only) */
+ /* .buildid (attr_mmap3 only) */
};

perf_addr_filters_adjust(vma);
@@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
inc = true;
if (event->attr.mmap || event->attr.mmap_data)
atomic_inc(&nr_mmap_events);
+ if (event->attr.mmap3)
+ atomic_inc(&nr_mmap3_events);
if (event->attr.comm)
atomic_inc(&nr_comm_events);
if (event->attr.namespaces)
--
2.26.2

2020-09-13 21:05:01

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 03/26] tools headers uapi: Sync tools/include/uapi/linux/perf_event.h

Sync uapi header with kernel version for mmap3 support.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 3e5dcdd48a49..84a0cbdab1ef 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -384,7 +384,8 @@ struct perf_event_attr {
aux_output : 1, /* generate AUX records instead of events */
cgroup : 1, /* include cgroup events */
text_poke : 1, /* include text poke events */
- __reserved_1 : 30;
+ mmap3 : 1, /* include bpf events */
+ __reserved_1 : 29;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1060,6 +1061,30 @@ enum perf_event_type {
*/
PERF_RECORD_TEXT_POKE = 20,

+ /*
+ * The MMAP3 records are an augmented version of MMAP2, they add
+ * build id value to identify the exact binary behind map
+ *
+ * struct {
+ * struct perf_event_header header;
+ *
+ * u32 pid, tid;
+ * u64 addr;
+ * u64 len;
+ * u64 pgoff;
+ * u32 maj;
+ * u32 min;
+ * u64 ino;
+ * u64 ino_generation;
+ * u32 prot, flags;
+ * u32 reserved;
+ * u8 buildid[20];
+ * char filename[];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_MMAP3 = 21,
+
PERF_RECORD_MAX, /* non-ABI */
};

--
2.26.2

2020-09-13 21:05:20

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 04/26] perf tools: Add filename__decompress function

Factor filename__decompress from decompress_kmodule function.
It can decompress files with compressions supported in perf -
xz and gz, the support needs to be compiled in.

It will to be used in following changes to get build id out of
compressed elf objects.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/dso.c | 31 +++++++++++++++++++------------
tools/perf/util/dso.h | 2 ++
2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 5a3b4755f0b3..0faa96ca7a04 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -279,18 +279,12 @@ bool dso__needs_decompress(struct dso *dso)
dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
}

-static int decompress_kmodule(struct dso *dso, const char *name,
- char *pathname, size_t len)
+int filename__decompress(const char *name, char *pathname,
+ size_t len, int comp, int *err)
{
char tmpbuf[] = KMOD_DECOMP_NAME;
int fd = -1;

- if (!dso__needs_decompress(dso))
- return -1;
-
- if (dso->comp == COMP_ID__NONE)
- return -1;
-
/*
* We have proper compression id for DSO and yet the file
* behind the 'name' can still be plain uncompressed object.
@@ -304,17 +298,17 @@ static int decompress_kmodule(struct dso *dso, const char *name,
* To keep this transparent, we detect this and return the file
* descriptor to the uncompressed file.
*/
- if (!compressions[dso->comp].is_compressed(name))
+ if (!compressions[comp].is_compressed(name))
return open(name, O_RDONLY);

fd = mkstemp(tmpbuf);
if (fd < 0) {
- dso->load_errno = errno;
+ *err = errno;
return -1;
}

- if (compressions[dso->comp].decompress(name, fd)) {
- dso->load_errno = DSO_LOAD_ERRNO__DECOMPRESSION_FAILURE;
+ if (compressions[comp].decompress(name, fd)) {
+ *err = DSO_LOAD_ERRNO__DECOMPRESSION_FAILURE;
close(fd);
fd = -1;
}
@@ -328,6 +322,19 @@ static int decompress_kmodule(struct dso *dso, const char *name,
return fd;
}

+static int decompress_kmodule(struct dso *dso, const char *name,
+ char *pathname, size_t len)
+{
+ if (!dso__needs_decompress(dso))
+ return -1;
+
+ if (dso->comp == COMP_ID__NONE)
+ return -1;
+
+ return filename__decompress(name, pathname, len, dso->comp,
+ &dso->load_errno);
+}
+
int dso__decompress_kmodule_fd(struct dso *dso, const char *name)
{
return decompress_kmodule(dso, name, NULL, 0);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 8ad17f395a19..f1efd2e11547 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -274,6 +274,8 @@ bool dso__needs_decompress(struct dso *dso);
int dso__decompress_kmodule_fd(struct dso *dso, const char *name);
int dso__decompress_kmodule_path(struct dso *dso, const char *name,
char *pathname, size_t len);
+int filename__decompress(const char *name, char *pathname,
+ size_t len, int comp, int *err);

#define KMOD_DECOMP_NAME "/tmp/perf-kmod-XXXXXX"
#define KMOD_DECOMP_LEN sizeof(KMOD_DECOMP_NAME)
--
2.26.2

2020-09-13 21:05:30

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 01/26] bpf: Move stack_map_get_build_id into lib

Moving stack_map_get_build_id into lib with
prototype in linux/buildid.h header:

int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);

This function returns build id for given struct vm_area_struct.
There is no functional change to stack_map_get_build_id function.

Cc: Alexei Starovoitov <[email protected]>
Cc: Song Liu <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/buildid.h | 11 ++++
kernel/bpf/stackmap.c | 143 ++--------------------------------------
lib/Makefile | 3 +-
lib/buildid.c | 136 ++++++++++++++++++++++++++++++++++++++
4 files changed, 153 insertions(+), 140 deletions(-)
create mode 100644 include/linux/buildid.h
create mode 100644 lib/buildid.c

diff --git a/include/linux/buildid.h b/include/linux/buildid.h
new file mode 100644
index 000000000000..3be5b49719f1
--- /dev/null
+++ b/include/linux/buildid.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_BUILDID_H
+#define _LINUX_BUILDID_H
+
+#include <linux/mm_types.h>
+
+#define BUILD_ID_SIZE 20
+
+int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);
+
+#endif
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index cfed0ac44d38..acd75ae8abff 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -7,10 +7,9 @@
#include <linux/kernel.h>
#include <linux/stacktrace.h>
#include <linux/perf_event.h>
-#include <linux/elf.h>
-#include <linux/pagemap.h>
#include <linux/irq_work.h>
#include <linux/btf_ids.h>
+#include <linux/buildid.h>
#include "percpu_freelist.h"

#define STACK_CREATE_FLAG_MASK \
@@ -153,140 +152,6 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
return ERR_PTR(err);
}

-#define BPF_BUILD_ID 3
-/*
- * Parse build id from the note segment. This logic can be shared between
- * 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are
- * identical.
- */
-static inline int stack_map_parse_build_id(void *page_addr,
- unsigned char *build_id,
- void *note_start,
- Elf32_Word note_size)
-{
- Elf32_Word note_offs = 0, new_offs;
-
- /* check for overflow */
- if (note_start < page_addr || note_start + note_size < note_start)
- return -EINVAL;
-
- /* only supports note that fits in the first page */
- if (note_start + note_size > page_addr + PAGE_SIZE)
- return -EINVAL;
-
- while (note_offs + sizeof(Elf32_Nhdr) < note_size) {
- Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs);
-
- if (nhdr->n_type == BPF_BUILD_ID &&
- nhdr->n_namesz == sizeof("GNU") &&
- nhdr->n_descsz > 0 &&
- nhdr->n_descsz <= BPF_BUILD_ID_SIZE) {
- memcpy(build_id,
- note_start + note_offs +
- ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr),
- nhdr->n_descsz);
- memset(build_id + nhdr->n_descsz, 0,
- BPF_BUILD_ID_SIZE - nhdr->n_descsz);
- return 0;
- }
- new_offs = note_offs + sizeof(Elf32_Nhdr) +
- ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4);
- if (new_offs <= note_offs) /* overflow */
- break;
- note_offs = new_offs;
- }
- return -EINVAL;
-}
-
-/* Parse build ID from 32-bit ELF */
-static int stack_map_get_build_id_32(void *page_addr,
- unsigned char *build_id)
-{
- Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr;
- Elf32_Phdr *phdr;
- int i;
-
- /* only supports phdr that fits in one page */
- if (ehdr->e_phnum >
- (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
- return -EINVAL;
-
- phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr));
-
- for (i = 0; i < ehdr->e_phnum; ++i) {
- if (phdr[i].p_type == PT_NOTE &&
- !stack_map_parse_build_id(page_addr, build_id,
- page_addr + phdr[i].p_offset,
- phdr[i].p_filesz))
- return 0;
- }
- return -EINVAL;
-}
-
-/* Parse build ID from 64-bit ELF */
-static int stack_map_get_build_id_64(void *page_addr,
- unsigned char *build_id)
-{
- Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr;
- Elf64_Phdr *phdr;
- int i;
-
- /* only supports phdr that fits in one page */
- if (ehdr->e_phnum >
- (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
- return -EINVAL;
-
- phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr));
-
- for (i = 0; i < ehdr->e_phnum; ++i) {
- if (phdr[i].p_type == PT_NOTE &&
- !stack_map_parse_build_id(page_addr, build_id,
- page_addr + phdr[i].p_offset,
- phdr[i].p_filesz))
- return 0;
- }
- return -EINVAL;
-}
-
-/* Parse build ID of ELF file mapped to vma */
-static int stack_map_get_build_id(struct vm_area_struct *vma,
- unsigned char *build_id)
-{
- Elf32_Ehdr *ehdr;
- struct page *page;
- void *page_addr;
- int ret;
-
- /* only works for page backed storage */
- if (!vma->vm_file)
- return -EINVAL;
-
- page = find_get_page(vma->vm_file->f_mapping, 0);
- if (!page)
- return -EFAULT; /* page not mapped */
-
- ret = -EINVAL;
- page_addr = kmap_atomic(page);
- ehdr = (Elf32_Ehdr *)page_addr;
-
- /* compare magic x7f "ELF" */
- if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0)
- goto out;
-
- /* only support executable file and shared object file */
- if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN)
- goto out;
-
- if (ehdr->e_ident[EI_CLASS] == ELFCLASS32)
- ret = stack_map_get_build_id_32(page_addr, build_id);
- else if (ehdr->e_ident[EI_CLASS] == ELFCLASS64)
- ret = stack_map_get_build_id_64(page_addr, build_id);
-out:
- kunmap_atomic(page_addr);
- put_page(page);
- return ret;
-}
-
static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
u64 *ips, u32 trace_nr, bool user)
{
@@ -327,18 +192,18 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
for (i = 0; i < trace_nr; i++) {
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
id_offs[i].ip = ips[i];
- memset(id_offs[i].build_id, 0, BPF_BUILD_ID_SIZE);
+ memset(id_offs[i].build_id, 0, BUILD_ID_SIZE);
}
return;
}

for (i = 0; i < trace_nr; i++) {
vma = find_vma(current->mm, ips[i]);
- if (!vma || stack_map_get_build_id(vma, id_offs[i].build_id)) {
+ if (!vma || build_id_parse(vma, id_offs[i].build_id)) {
/* per entry fall back to ips */
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
id_offs[i].ip = ips[i];
- memset(id_offs[i].build_id, 0, BPF_BUILD_ID_SIZE);
+ memset(id_offs[i].build_id, 0, BUILD_ID_SIZE);
continue;
}
id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i]
diff --git a/lib/Makefile b/lib/Makefile
index a4a4c6864f51..da7ae66ee4ab 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -36,7 +36,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
flex_proportions.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
earlycpio.o seq_buf.o siphash.o dec_and_lock.o \
- nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o
+ nmi_backtrace.o nodemask.o win_minmax.o memcat_p.o \
+ buildid.o

lib-$(CONFIG_PRINTK) += dump_stack.o
lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/buildid.c b/lib/buildid.c
new file mode 100644
index 000000000000..e8d5feb7ef20
--- /dev/null
+++ b/lib/buildid.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/buildid.h>
+#include <linux/elf.h>
+#include <linux/pagemap.h>
+
+#define BUILD_ID 3
+/*
+ * Parse build id from the note segment. This logic can be shared between
+ * 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are
+ * identical.
+ */
+static inline int parse_build_id(void *page_addr,
+ unsigned char *build_id,
+ void *note_start,
+ Elf32_Word note_size)
+{
+ Elf32_Word note_offs = 0, new_offs;
+
+ /* check for overflow */
+ if (note_start < page_addr || note_start + note_size < note_start)
+ return -EINVAL;
+
+ /* only supports note that fits in the first page */
+ if (note_start + note_size > page_addr + PAGE_SIZE)
+ return -EINVAL;
+
+ while (note_offs + sizeof(Elf32_Nhdr) < note_size) {
+ Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs);
+
+ if (nhdr->n_type == BUILD_ID &&
+ nhdr->n_namesz == sizeof("GNU") &&
+ nhdr->n_descsz > 0 &&
+ nhdr->n_descsz <= BUILD_ID_SIZE) {
+ memcpy(build_id,
+ note_start + note_offs +
+ ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr),
+ nhdr->n_descsz);
+ memset(build_id + nhdr->n_descsz, 0,
+ BUILD_ID_SIZE - nhdr->n_descsz);
+ return 0;
+ }
+ new_offs = note_offs + sizeof(Elf32_Nhdr) +
+ ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4);
+ if (new_offs <= note_offs) /* overflow */
+ break;
+ note_offs = new_offs;
+ }
+ return -EINVAL;
+}
+
+/* Parse build ID from 32-bit ELF */
+static int get_build_id_32(void *page_addr, unsigned char *build_id)
+{
+ Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr;
+ Elf32_Phdr *phdr;
+ int i;
+
+ /* only supports phdr that fits in one page */
+ if (ehdr->e_phnum >
+ (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
+ return -EINVAL;
+
+ phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr));
+
+ for (i = 0; i < ehdr->e_phnum; ++i) {
+ if (phdr[i].p_type == PT_NOTE &&
+ !parse_build_id(page_addr, build_id,
+ page_addr + phdr[i].p_offset,
+ phdr[i].p_filesz))
+ return 0;
+ }
+ return -EINVAL;
+}
+
+/* Parse build ID from 64-bit ELF */
+static int get_build_id_64(void *page_addr, unsigned char *build_id)
+{
+ Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr;
+ Elf64_Phdr *phdr;
+ int i;
+
+ /* only supports phdr that fits in one page */
+ if (ehdr->e_phnum >
+ (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
+ return -EINVAL;
+
+ phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr));
+
+ for (i = 0; i < ehdr->e_phnum; ++i) {
+ if (phdr[i].p_type == PT_NOTE &&
+ !parse_build_id(page_addr, build_id,
+ page_addr + phdr[i].p_offset,
+ phdr[i].p_filesz))
+ return 0;
+ }
+ return -EINVAL;
+}
+
+/* Parse build ID of ELF file mapped to vma */
+int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id)
+{
+ Elf32_Ehdr *ehdr;
+ struct page *page;
+ void *page_addr;
+ int ret;
+
+ /* only works for page backed storage */
+ if (!vma->vm_file)
+ return -EINVAL;
+
+ page = find_get_page(vma->vm_file->f_mapping, 0);
+ if (!page)
+ return -EFAULT; /* page not mapped */
+
+ ret = -EINVAL;
+ page_addr = kmap_atomic(page);
+ ehdr = (Elf32_Ehdr *)page_addr;
+
+ /* compare magic x7f "ELF" */
+ if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0)
+ goto out;
+
+ /* only support executable file and shared object file */
+ if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN)
+ goto out;
+
+ if (ehdr->e_ident[EI_CLASS] == ELFCLASS32)
+ ret = get_build_id_32(page_addr, build_id);
+ else if (ehdr->e_ident[EI_CLASS] == ELFCLASS64)
+ ret = get_build_id_64(page_addr, build_id);
+out:
+ kunmap_atomic(page_addr);
+ put_page(page);
+ return ret;
+}
--
2.26.2

2020-09-13 21:05:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 05/26] perf tools: Add build_id__is_defined function

Adding build_id__is_defined helper to check build id
is defined and is != zero build id.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 11 +++++++++++
tools/perf/util/build-id.h | 1 +
2 files changed, 12 insertions(+)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 31207b6e2066..bdee4e08e60d 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -902,3 +902,14 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)

return ret;
}
+
+bool build_id__is_defined(const u8 *build_id)
+{
+ static u8 zero[BUILD_ID_SIZE];
+ int err = 0;
+
+ if (build_id)
+ err = memcmp(build_id, &zero, BUILD_ID_SIZE);
+
+ return err ? true : false;
+}
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index aad419bb165c..1ceede45c231 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -14,6 +14,7 @@ extern struct perf_tool build_id__mark_dso_hit_ops;
struct dso;
struct feat_fd;

+bool build_id__is_defined(const u8 *build_id);
int build_id__sprintf(const u8 *build_id, int len, char *bf);
int sysfs__sprintf_build_id(const char *root_dir, char *sbuild_id);
int filename__sprintf_build_id(const char *pathname, char *sbuild_id);
--
2.26.2

2020-09-13 21:06:05

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 07/26] perf tools: Add check for existing link in buildid dir

When adding new build id link we fail if the link is already
there. Adding check for existing link and warn/replace the
link with new target.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index bdee4e08e60d..ecdc167aa1a0 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -751,8 +751,26 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
tmp = dir_name + strlen(buildid_dir) - 5;
memcpy(tmp, "../..", 5);

- if (symlink(tmp, linkname) == 0)
+ if (symlink(tmp, linkname) == 0) {
err = 0;
+ } else if (errno == EEXIST) {
+ char path[PATH_MAX];
+
+ if (readlink(linkname, path, sizeof(path)) == -1) {
+ pr_err("Cant read link: %s\n", linkname);
+ goto out_free;
+ }
+ if (strcmp(tmp, path)) {
+ pr_err("Inconsistent .debug record, updating [%s]\n",
+ linkname);
+
+ unlink(linkname);
+
+ if (symlink(tmp, linkname))
+ goto out_free;
+ }
+ err = 0;
+ }

/* Update SDT cache : error is just warned */
if (realname &&
--
2.26.2

2020-09-13 21:06:09

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 12/26] perf tools: Set build id for kernel dso objects

Setting build id for kernel dso objects when parsed
from mmap3 event.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/machine.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 17d6fd19ef79..863d949ef967 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1590,7 +1590,8 @@ static int machine__process_extra_kernel_map(struct machine *machine,
}

static int machine__process_kernel_mmap_event(struct machine *machine,
- struct extra_kernel_map *xm)
+ struct extra_kernel_map *xm,
+ __u8 *buildid)
{
struct map *map;
enum dso_space_type dso_space;
@@ -1615,6 +1616,9 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
goto out_problem;

map->end = map->start + xm->end - xm->start;
+
+ if (build_id__is_defined(buildid))
+ dso__set_build_id(map->dso, buildid);
} else if (is_kernel_mmap) {
const char *symbol_name = (xm->name + strlen(machine->mmap_name));
/*
@@ -1672,6 +1676,9 @@ static int machine__process_kernel_mmap_event(struct machine *machine,

machine__update_kernel_mmap(machine, xm->start, xm->end);

+ if (build_id__is_defined(buildid))
+ dso__set_build_id(kernel, buildid);
+
/*
* Avoid using a zero address (kptr_restrict) for the ref reloc
* symbol. Effectively having zero here means that at record
@@ -1724,7 +1731,7 @@ int machine__process_mmap3_event(struct machine *machine,
};

strlcpy(xm.name, event->mmap3.filename, KMAP_NAME_LEN);
- ret = machine__process_kernel_mmap_event(machine, &xm);
+ ret = machine__process_kernel_mmap_event(machine, &xm, event->mmap3.buildid);
if (ret < 0)
goto out_problem;
return 0;
@@ -1791,7 +1798,7 @@ int machine__process_mmap2_event(struct machine *machine,
};

strlcpy(xm.name, event->mmap2.filename, KMAP_NAME_LEN);
- ret = machine__process_kernel_mmap_event(machine, &xm);
+ ret = machine__process_kernel_mmap_event(machine, &xm, NULL);
if (ret < 0)
goto out_problem;
return 0;
@@ -1848,7 +1855,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
};

strlcpy(xm.name, event->mmap.filename, KMAP_NAME_LEN);
- ret = machine__process_kernel_mmap_event(machine, &xm);
+ ret = machine__process_kernel_mmap_event(machine, &xm, NULL);
if (ret < 0)
goto out_problem;
return 0;
--
2.26.2

2020-09-13 21:06:25

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 13/26] perf tools: Plug in mmap3 event

Add mmap3 event processing to all perf tools that process
events and let them call the perf_event__process_mmap3
function.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-annotate.c | 1 +
tools/perf/builtin-c2c.c | 1 +
tools/perf/builtin-diff.c | 1 +
tools/perf/builtin-inject.c | 38 +++++++++++++++++++++++++++++++
tools/perf/builtin-kmem.c | 1 +
tools/perf/builtin-mem.c | 1 +
tools/perf/builtin-record.c | 14 ++++++++++++
tools/perf/builtin-report.c | 2 ++
tools/perf/builtin-script.c | 1 +
tools/perf/builtin-trace.c | 1 +
tools/perf/util/build-id.c | 1 +
tools/perf/util/data-convert-bt.c | 1 +
12 files changed, 63 insertions(+)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 4940d10074c3..f68e86bfeb3b 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -474,6 +474,7 @@ int cmd_annotate(int argc, const char **argv)
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5938b100eaf4..a7d1061fde98 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -365,6 +365,7 @@ static struct perf_c2c c2c = {
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index f8c9bdd8269a..f8c77fe8f7a4 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -450,6 +450,7 @@ static struct perf_diff pdiff = {
.sample = diff__process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 6d2f410d773a..ef31603d126e 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -316,6 +316,19 @@ static int perf_event__repipe_mmap2(struct perf_tool *tool,
return err;
}

+static int perf_event__repipe_mmap3(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ int err;
+
+ err = perf_event__process_mmap3(tool, event, sample, machine);
+ perf_event__repipe(tool, event, sample, machine);
+
+ return err;
+}
+
#ifdef HAVE_JITDUMP
static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
union perf_event *event,
@@ -339,6 +352,29 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
}
return perf_event__repipe_mmap2(tool, event, sample, machine);
}
+
+static int perf_event__jit_repipe_mmap3(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
+ u64 n = 0;
+ int ret;
+
+ /*
+ * if jit marker, then inject jit mmaps and generate ELF images
+ */
+ ret = jit_process(inject->session, &inject->output, machine,
+ event->mmap3.filename, sample->pid, &n);
+ if (ret < 0)
+ return ret;
+ if (ret) {
+ inject->bytes_written += n;
+ return 0;
+ }
+ return perf_event__repipe_mmap3(tool, event, sample, machine);
+}
#endif

static int perf_event__repipe_fork(struct perf_tool *tool,
@@ -609,6 +645,7 @@ static int __cmd_inject(struct perf_inject *inject)
inject->itrace_synth_opts.set) {
inject->tool.mmap = perf_event__repipe_mmap;
inject->tool.mmap2 = perf_event__repipe_mmap2;
+ inject->tool.mmap3 = perf_event__repipe_mmap3;
inject->tool.fork = perf_event__repipe_fork;
inject->tool.tracing_data = perf_event__repipe_tracing_data;
}
@@ -818,6 +855,7 @@ int cmd_inject(int argc, const char **argv)
#ifdef HAVE_JITDUMP
if (inject.jit_mode) {
inject.tool.mmap2 = perf_event__jit_repipe_mmap2;
+ inject.tool.mmap3 = perf_event__jit_repipe_mmap3;
inject.tool.mmap = perf_event__jit_repipe_mmap;
inject.tool.ordered_events = true;
inject.tool.ordering_requires_timestamps = true;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index a50dae2c4ae9..59f7fe42cb09 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -969,6 +969,7 @@ static struct perf_tool perf_kmem = {
.comm = perf_event__process_comm,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.namespaces = perf_event__process_namespaces,
.ordered_events = true,
};
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..7be8b4d6f2c9 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -383,6 +383,7 @@ int cmd_mem(int argc, const char **argv)
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.lost = perf_event__process_lost,
.fork = perf_event__process_fork,
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index adf311d15d3d..5ce293fac103 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2366,6 +2366,19 @@ static int build_id__process_mmap2(struct perf_tool *tool, union perf_event *eve
return perf_event__process_mmap2(tool, event, sample, machine);
}

+static int build_id__process_mmap3(struct perf_tool *tool, union perf_event *event,
+ struct perf_sample *sample, struct machine *machine)
+{
+ /*
+ * We already have the kernel maps, put in place via perf_session__create_kernel_maps()
+ * no need to add them twice.
+ */
+ if (!(event->header.misc & PERF_RECORD_MISC_USER))
+ return 0;
+
+ return perf_event__process_mmap3(tool, event, sample, machine);
+}
+
/*
* XXX Ideally would be local to cmd_record() and passed to a record__new
* because we need to have access to it in record__exit, that is called
@@ -2400,6 +2413,7 @@ static struct record record = {
.namespaces = perf_event__process_namespaces,
.mmap = build_id__process_mmap,
.mmap2 = build_id__process_mmap2,
+ .mmap3 = build_id__process_mmap3,
.ordered_events = true,
},
};
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3c74c9c0f3c3..3dd37513eb94 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -731,6 +731,7 @@ static void tasks_setup(struct report *rep)
if (rep->mmaps_mode) {
rep->tool.mmap = perf_event__process_mmap;
rep->tool.mmap2 = perf_event__process_mmap2;
+ rep->tool.mmap3 = perf_event__process_mmap3;
}
rep->tool.comm = perf_event__process_comm;
rep->tool.exit = perf_event__process_exit;
@@ -1120,6 +1121,7 @@ int cmd_report(int argc, const char **argv)
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.namespaces = perf_event__process_namespaces,
.cgroup = perf_event__process_cgroup,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 484ce6067d23..d839983cfb88 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3443,6 +3443,7 @@ int cmd_script(int argc, const char **argv)
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.namespaces = perf_event__process_namespaces,
.cgroup = perf_event__process_cgroup,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index bea461b6f937..8d00220c842b 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -4192,6 +4192,7 @@ static int trace__replay(struct trace *trace)
trace->tool.sample = trace__process_sample;
trace->tool.mmap = perf_event__process_mmap;
trace->tool.mmap2 = perf_event__process_mmap2;
+ trace->tool.mmap3 = perf_event__process_mmap3;
trace->tool.comm = perf_event__process_comm;
trace->tool.exit = perf_event__process_exit;
trace->tool.fork = perf_event__process_fork;
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 6165f9d1d941..b281c97894e0 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -88,6 +88,7 @@ struct perf_tool build_id__mark_dso_hit_ops = {
.sample = build_id__mark_dso_hit,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.fork = perf_event__process_fork,
.exit = perf_event__exit_del_thread,
.attr = perf_event__process_attr,
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 27c5fef9ad54..dbc3eba658a5 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1606,6 +1606,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
.sample = process_sample_event,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .mmap3 = perf_event__process_mmap3,
.comm = perf_event__process_comm,
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
--
2.26.2

2020-09-13 21:06:32

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 11/26] perf tools: Add mmap3 support

Adding support to process mmap3 events. Adding the event struct
perf_record_mmap3 object and perf_event__process_mmap3 function
to process and store its build id data directly in map's dso
object.

Adding all standard callbacks for new event type and mmap3 swap
function. The mmap3 event trace dump contains in addition the
build id data.

The mmap3 report -D dump looks like:

0 0 0x418 [0x98]: PERF_RECORD_MMAP3 -1/0: <44f35083700d2fc423d4c3f7238b31f5c6500444> \
[0xffffffff81000000(0xe00d17) @ 0xffffffff81000000 00:00 0 0]: ---p [kernel.kallsyms]_text

with build id displayed within <> braces.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/perf/include/perf/event.h | 18 +++++++
tools/perf/util/event.c | 32 +++++++++++++
tools/perf/util/event.h | 5 ++
tools/perf/util/machine.c | 74 ++++++++++++++++++++++++++++-
tools/perf/util/machine.h | 2 +
tools/perf/util/map.c | 8 +++-
tools/perf/util/map.h | 2 +-
tools/perf/util/session.c | 28 +++++++++++
tools/perf/util/tool.h | 1 +
9 files changed, 165 insertions(+), 5 deletions(-)

diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index 842028858d66..03f7313beee0 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -32,6 +32,23 @@ struct perf_record_mmap2 {
char filename[PATH_MAX];
};

+struct perf_record_mmap3 {
+ struct perf_event_header header;
+ __u32 pid, tid;
+ __u64 start;
+ __u64 len;
+ __u64 pgoff;
+ __u32 maj;
+ __u32 min;
+ __u64 ino;
+ __u64 ino_generation;
+ __u32 prot;
+ __u32 flags;
+ __u32 reserved;
+ __u8 buildid[20];
+ char filename[PATH_MAX];
+};
+
struct perf_record_comm {
struct perf_event_header header;
__u32 pid, tid;
@@ -364,6 +381,7 @@ union perf_event {
struct perf_event_header header;
struct perf_record_mmap mmap;
struct perf_record_mmap2 mmap2;
+ struct perf_record_mmap3 mmap3;
struct perf_record_comm comm;
struct perf_record_namespaces namespaces;
struct perf_record_cgroup cgroup;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 317a26571845..35e5a088e591 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -57,6 +57,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_BPF_EVENT] = "BPF_EVENT",
[PERF_RECORD_CGROUP] = "CGROUP",
[PERF_RECORD_TEXT_POKE] = "TEXT_POKE",
+ [PERF_RECORD_MMAP3] = "MMAP3",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
@@ -301,6 +302,26 @@ size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp)
event->mmap2.filename);
}

+size_t perf_event__fprintf_mmap3(union perf_event *event, FILE *fp)
+{
+ char sbuild_id[SBUILD_ID_SIZE];
+
+ build_id__sprintf(event->mmap3.buildid, BUILD_ID_SIZE, sbuild_id);
+
+ return fprintf(fp, " %d/%d: <%s> [%#" PRI_lx64 "(%#" PRI_lx64 ") @ %#" PRI_lx64
+ " %02x:%02x %"PRI_lu64" %"PRI_lu64"]: %c%c%c%c %s\n",
+ event->mmap3.pid, event->mmap3.tid,
+ sbuild_id, event->mmap3.start,
+ event->mmap3.len, event->mmap3.pgoff, event->mmap3.maj,
+ event->mmap3.min, event->mmap3.ino,
+ event->mmap3.ino_generation,
+ (event->mmap3.prot & PROT_READ) ? 'r' : '-',
+ (event->mmap3.prot & PROT_WRITE) ? 'w' : '-',
+ (event->mmap3.prot & PROT_EXEC) ? 'x' : '-',
+ (event->mmap3.flags & MAP_SHARED) ? 's' : 'p',
+ event->mmap3.filename);
+}
+
size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp)
{
struct perf_thread_map *threads = thread_map__new_event(&event->thread_map);
@@ -349,6 +370,14 @@ int perf_event__process_mmap2(struct perf_tool *tool __maybe_unused,
return machine__process_mmap2_event(machine, event, sample);
}

+int perf_event__process_mmap3(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ return machine__process_mmap3_event(machine, event, sample);
+}
+
size_t perf_event__fprintf_task(union perf_event *event, FILE *fp)
{
return fprintf(fp, "(%d:%d):(%d:%d)\n",
@@ -493,6 +522,9 @@ size_t perf_event__fprintf(union perf_event *event, struct machine *machine, FIL
case PERF_RECORD_MMAP2:
ret += perf_event__fprintf_mmap2(event, fp);
break;
+ case PERF_RECORD_MMAP3:
+ ret += perf_event__fprintf_mmap3(event, fp);
+ break;
case PERF_RECORD_AUX:
ret += perf_event__fprintf_aux(event, fp);
break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index b828b99176f4..6e6d5c5e9ad5 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -335,6 +335,10 @@ int perf_event__process_mmap2(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine);
+int perf_event__process_mmap3(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int perf_event__process_fork(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -379,6 +383,7 @@ const char *perf_event__name(unsigned int id);
size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_mmap3(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_aux(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2805aedc1062..17d6fd19ef79 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1697,6 +1697,74 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
return -1;
}

+int machine__process_mmap3_event(struct machine *machine,
+ union perf_event *event,
+ struct perf_sample *sample)
+{
+ struct thread *thread;
+ struct map *map;
+ struct dso_id dso_id = {
+ .maj = event->mmap3.maj,
+ .min = event->mmap3.min,
+ .ino = event->mmap3.ino,
+ .ino_generation = event->mmap3.ino_generation,
+ };
+ u8 *buildid = NULL;
+ int ret = 0;
+
+ if (dump_trace)
+ perf_event__fprintf_mmap3(event, stdout);
+
+ if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
+ sample->cpumode == PERF_RECORD_MISC_KERNEL) {
+ struct extra_kernel_map xm = {
+ .start = event->mmap3.start,
+ .end = event->mmap3.start + event->mmap3.len,
+ .pgoff = event->mmap3.pgoff,
+ };
+
+ strlcpy(xm.name, event->mmap3.filename, KMAP_NAME_LEN);
+ ret = machine__process_kernel_mmap_event(machine, &xm);
+ if (ret < 0)
+ goto out_problem;
+ return 0;
+ }
+
+ thread = machine__findnew_thread(machine, event->mmap3.pid,
+ event->mmap3.tid);
+ if (thread == NULL)
+ goto out_problem;
+
+ /* If we got empty build id, do not set it. */
+ if (build_id__is_defined(event->mmap3.buildid))
+ buildid = event->mmap3.buildid;
+
+ map = map__new(machine, event->mmap3.start,
+ event->mmap3.len, event->mmap3.pgoff,
+ &dso_id, event->mmap3.prot,
+ event->mmap3.flags, buildid,
+ event->mmap3.filename, thread);
+
+ if (map == NULL)
+ goto out_problem_map;
+
+ ret = thread__insert_map(thread, map);
+ if (ret)
+ goto out_problem_insert;
+
+ thread__put(thread);
+ map__put(map);
+ return 0;
+
+out_problem_insert:
+ map__put(map);
+out_problem_map:
+ thread__put(thread);
+out_problem:
+ dump_printf("problem processing PERF_RECORD_MMAP2, skipping event.\n");
+ return 0;
+}
+
int machine__process_mmap2_event(struct machine *machine,
union perf_event *event,
struct perf_sample *sample)
@@ -1737,7 +1805,7 @@ int machine__process_mmap2_event(struct machine *machine,
map = map__new(machine, event->mmap2.start,
event->mmap2.len, event->mmap2.pgoff,
&dso_id, event->mmap2.prot,
- event->mmap2.flags,
+ event->mmap2.flags, NULL,
event->mmap2.filename, thread);

if (map == NULL)
@@ -1796,7 +1864,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event

map = map__new(machine, event->mmap.start,
event->mmap.len, event->mmap.pgoff,
- NULL, prot, 0, event->mmap.filename, thread);
+ NULL, prot, 0, NULL, event->mmap.filename, thread);

if (map == NULL)
goto out_problem_map;
@@ -1956,6 +2024,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
ret = machine__process_cgroup_event(machine, event, sample); break;
case PERF_RECORD_MMAP2:
ret = machine__process_mmap2_event(machine, event, sample); break;
+ case PERF_RECORD_MMAP3:
+ ret = machine__process_mmap3_event(machine, event, sample); break;
case PERF_RECORD_FORK:
ret = machine__process_fork_event(machine, event, sample); break;
case PERF_RECORD_EXIT:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 062c36a8433c..a3c1d0bf89e5 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -135,6 +135,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
struct perf_sample *sample);
int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
+int machine__process_mmap3_event(struct machine *machine, union perf_event *event,
+ struct perf_sample *sample);
int machine__process_ksymbol(struct machine *machine,
union perf_event *event,
struct perf_sample *sample);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index cc0faf8f1321..697e87d9fd66 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -145,8 +145,8 @@ void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso)

struct map *map__new(struct machine *machine, u64 start, u64 len,
u64 pgoff, struct dso_id *id,
- u32 prot, u32 flags, char *filename,
- struct thread *thread)
+ u32 prot, u32 flags, u8 *buildid,
+ char *filename, struct thread *thread)
{
struct map *map = malloc(sizeof(*map));
struct nsinfo *nsi = NULL;
@@ -209,6 +209,10 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
dso__set_loaded(dso);
}
dso->nsinfo = nsi;
+
+ if (build_id__is_defined(buildid))
+ dso__set_build_id(dso, buildid);
+
dso__put(dso);
}
return map;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index c2f5d28fe73a..99b3036ffcc7 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -107,7 +107,7 @@ struct dso_id;

struct map *map__new(struct machine *machine, u64 start, u64 len,
u64 pgoff, struct dso_id *id, u32 prot, u32 flags,
- char *filename, struct thread *thread);
+ u8 *buildid, char *filename, struct thread *thread);
struct map *map__new2(u64 start, struct dso *dso);
void map__delete(struct map *map);
struct map *map__clone(struct map *map);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7a5f03764702..45e062d4029f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -466,6 +466,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->mmap = process_event_stub;
if (tool->mmap2 == NULL)
tool->mmap2 = process_event_stub;
+ if (tool->mmap3 == NULL)
+ tool->mmap3 = process_event_stub;
if (tool->comm == NULL)
tool->comm = process_event_stub;
if (tool->namespaces == NULL)
@@ -603,6 +605,27 @@ static void perf_event__mmap2_swap(union perf_event *event,
swap_sample_id_all(event, data);
}
}
+
+static void perf_event__mmap3_swap(union perf_event *event,
+ bool sample_id_all)
+{
+ event->mmap3.pid = bswap_32(event->mmap3.pid);
+ event->mmap3.tid = bswap_32(event->mmap3.tid);
+ event->mmap3.start = bswap_64(event->mmap3.start);
+ event->mmap3.len = bswap_64(event->mmap3.len);
+ event->mmap3.pgoff = bswap_64(event->mmap3.pgoff);
+ event->mmap3.maj = bswap_32(event->mmap3.maj);
+ event->mmap3.min = bswap_32(event->mmap3.min);
+ event->mmap3.ino = bswap_64(event->mmap3.ino);
+
+ if (sample_id_all) {
+ void *data = &event->mmap3.filename;
+
+ data += PERF_ALIGN(strlen(data) + 1, sizeof(u64));
+ swap_sample_id_all(event, data);
+ }
+}
+
static void perf_event__task_swap(union perf_event *event, bool sample_id_all)
{
event->fork.pid = bswap_32(event->fork.pid);
@@ -938,6 +961,7 @@ typedef void (*perf_event__swap_op)(union perf_event *event,
static perf_event__swap_op perf_event__swap_ops[] = {
[PERF_RECORD_MMAP] = perf_event__mmap_swap,
[PERF_RECORD_MMAP2] = perf_event__mmap2_swap,
+ [PERF_RECORD_MMAP3] = perf_event__mmap3_swap,
[PERF_RECORD_COMM] = perf_event__comm_swap,
[PERF_RECORD_FORK] = perf_event__task_swap,
[PERF_RECORD_EXIT] = perf_event__task_swap,
@@ -1453,6 +1477,10 @@ static int machines__deliver_event(struct machines *machines,
if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
++evlist->stats.nr_proc_map_timeout;
return tool->mmap2(tool, event, sample, machine);
+ case PERF_RECORD_MMAP3:
+ if (event->header.misc & PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT)
+ ++evlist->stats.nr_proc_map_timeout;
+ return tool->mmap3(tool, event, sample, machine);
case PERF_RECORD_COMM:
return tool->comm(tool, event, sample, machine);
case PERF_RECORD_NAMESPACES:
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index bbbc0dcd461f..8f5ee25bfa35 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -44,6 +44,7 @@ struct perf_tool {
read;
event_op mmap,
mmap2,
+ mmap3,
comm,
namespaces,
cgroup,
--
2.26.2

2020-09-13 21:06:38

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 14/26] perf tools: Add mmap3 events to --show-mmap-events option

Displaying mmap3 events for --show-mmap-events option,
the build id is displayed within <> braces:

$ perf script --show-mmap-events
kill 12148 13893.519014: PERF_RECORD_MMAP3 12148/12148: <43938d0803c5e3130ea679cd569aaf44b98d9ae8> [0x560e7d7f..
kill 12148 13893.519420: PERF_RECORD_MMAP3 12148/12148: <1805c738c8f3ec0f47b7ea09080c28f34d18a82b> [0x7f9e7dfc..

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-script.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d839983cfb88..9c09581d5cb0 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2342,6 +2342,38 @@ static int process_mmap2_event(struct perf_tool *tool,
event->mmap2.tid);
}

+static int process_mmap3_event(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine)
+{
+ struct thread *thread;
+ struct perf_script *script = container_of(tool, struct perf_script, tool);
+ struct perf_session *session = script->session;
+ struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+
+ if (perf_event__process_mmap3(tool, event, sample, machine) < 0)
+ return -1;
+
+ thread = machine__findnew_thread(machine, event->mmap3.pid, event->mmap3.tid);
+ if (thread == NULL) {
+ pr_debug("problem processing MMAP2 event, skipping it.\n");
+ return -1;
+ }
+
+ if (!evsel->core.attr.sample_id_all) {
+ sample->cpu = 0;
+ sample->time = 0;
+ sample->tid = event->mmap3.tid;
+ sample->pid = event->mmap3.pid;
+ }
+ perf_sample__fprintf_start(script, sample, thread, evsel,
+ PERF_RECORD_MMAP3, stdout);
+ perf_event__fprintf(event, machine, stdout);
+ thread__put(thread);
+ return 0;
+}
+
static int process_switch_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -2498,6 +2530,7 @@ static int __cmd_script(struct perf_script *script)
if (script->show_mmap_events) {
script->tool.mmap = process_mmap_event;
script->tool.mmap2 = process_mmap2_event;
+ script->tool.mmap3 = process_mmap3_event;
}
if (script->show_switch_events || (scripting_ops && scripting_ops->process_switch))
script->tool.context_switch = process_switch_event;
--
2.26.2

2020-09-13 21:06:59

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 17/26] perf tools: Synthesize kernel with mmap3

Synthesizing kernel with mmap3 events so we can
get build id data for kernel map as well.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/synthetic-events.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 6bd2423ce2f3..844ca87b6e97 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1029,7 +1029,7 @@ static int __perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
* available use this, and after it is use this as a fallback for older
* kernels.
*/
- event = zalloc((sizeof(event->mmap) + machine->id_hdr_size));
+ event = zalloc((sizeof(event->mmap3) + machine->id_hdr_size));
if (event == NULL) {
pr_debug("Not enough memory synthesizing mmap event "
"for kernel modules\n");
@@ -1046,16 +1046,21 @@ static int __perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
}

- size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
+ size = snprintf(event->mmap3.filename, sizeof(event->mmap3.filename),
"%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) + 1;
size = PERF_ALIGN(size, sizeof(u64));
- event->mmap.header.type = PERF_RECORD_MMAP;
- event->mmap.header.size = (sizeof(event->mmap) -
- (sizeof(event->mmap.filename) - size) + machine->id_hdr_size);
- event->mmap.pgoff = kmap->ref_reloc_sym->addr;
- event->mmap.start = map->start;
- event->mmap.len = map->end - event->mmap.start;
- event->mmap.pid = machine->pid;
+ event->mmap3.header.type = PERF_RECORD_MMAP3;
+ event->mmap3.header.size = (sizeof(event->mmap3) -
+ (sizeof(event->mmap3.filename) - size) + machine->id_hdr_size);
+ event->mmap3.pgoff = kmap->ref_reloc_sym->addr;
+ event->mmap3.start = map->start;
+ event->mmap3.len = map->end - event->mmap3.start;
+ event->mmap3.pid = machine->pid;
+
+ err = sysfs__read_build_id("/sys/kernel/notes", event->mmap3.buildid,
+ BUILD_ID_SIZE);
+ if (err)
+ pr_err("Failed to read kernel build ID\n");

err = perf_tool__process_synth_event(tool, event, machine, process);
free(event);
--
2.26.2

2020-09-13 21:07:20

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 22/26] perf tools: Use machine__for_each_dso in perf_session__cache_build_ids

Using machine__for_each_dso in perf_session__cache_build_ids,
so we can reuse perf_session__cache_build_ids with different
callback in following changes.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 41 +++++++++++++++-----------------------
1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index bf044e52ad1f..22968504c6de 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -869,12 +869,16 @@ int build_id_cache__remove_s(const char *sbuild_id)
return err;
}

-static int dso__cache_build_id(struct dso *dso, struct machine *machine)
+static int dso__cache_build_id(struct dso *dso, struct machine *machine,
+ void *priv __maybe_unused)
{
bool is_kallsyms = dso__is_kallsyms(dso);
bool is_vdso = dso__is_vdso(dso);
const char *name = dso->long_name;

+ if (!dso->has_build_id)
+ return 0;
+
if (dso__is_kcore(dso)) {
is_kallsyms = true;
name = machine->mmap_name;
@@ -883,43 +887,30 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine)
dso->nsinfo, is_kallsyms, is_vdso);
}

-static int __dsos__cache_build_ids(struct list_head *head,
- struct machine *machine)
+static int
+machines__for_each_dso(struct machines *machines, machine__dso_t fn, void *priv)
{
- struct dso *pos;
- int err = 0;
-
- dsos__for_each_with_build_id(pos, head)
- if (dso__cache_build_id(pos, machine))
- err = -1;
+ int ret = machine__for_each_dso(&machines->host, fn, priv);
+ struct rb_node *nd;

- return err;
-}
+ for (nd = rb_first_cached(&machines->guests); nd;
+ nd = rb_next(nd)) {
+ struct machine *pos = rb_entry(nd, struct machine, rb_node);

-static int machine__cache_build_ids(struct machine *machine)
-{
- return __dsos__cache_build_ids(&machine->dsos.head, machine);
+ ret |= machine__for_each_dso(pos, fn, priv);
+ }
+ return ret ? -1 : 0;
}

int perf_session__cache_build_ids(struct perf_session *session)
{
- struct rb_node *nd;
- int ret;
-
if (no_buildid_cache)
return 0;

if (mkdir(buildid_dir, 0755) != 0 && errno != EEXIST)
return -1;

- ret = machine__cache_build_ids(&session->machines.host);
-
- for (nd = rb_first_cached(&session->machines.guests); nd;
- nd = rb_next(nd)) {
- struct machine *pos = rb_entry(nd, struct machine, rb_node);
- ret |= machine__cache_build_ids(pos);
- }
- return ret ? -1 : 0;
+ return machines__for_each_dso(&session->machines, dso__cache_build_id, NULL) ? -1 : 0;
}

static bool machine__read_build_ids(struct machine *machine, bool with_hits)
--
2.26.2

2020-09-13 21:07:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 09/26] perf tools: Try load vmlinux from buildid database

Currently we don't check on kernel's vmlinux the same way as
we do for normal binaries, but we either look for kallsyms
file in build id database or check on known vmlinux locations
(plus some other optional paths).

This patch adds the check for standard build id binary location,
so we are ready once we start to store it there from debuginfod
in following changes.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 13 ++++++++++---
tools/perf/util/build-id.h | 2 ++
tools/perf/util/symbol.c | 14 ++++++++++++++
3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index ecdc167aa1a0..6165f9d1d941 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -259,10 +259,9 @@ static const char *build_id_cache__basename(bool is_kallsyms, bool is_vdso,
"debug" : "elf"));
}

-char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
- bool is_debug)
+char *__dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
+ bool is_debug, bool is_kallsyms)
{
- bool is_kallsyms = dso__is_kallsyms((struct dso *)dso);
bool is_vdso = dso__is_vdso((struct dso *)dso);
char sbuild_id[SBUILD_ID_SIZE];
char *linkname;
@@ -291,6 +290,14 @@ char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
return bf;
}

+char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
+ bool is_debug)
+{
+ bool is_kallsyms = dso__is_kallsyms((struct dso *)dso);
+
+ return __dso__build_id_filename(dso, bf, size, is_debug, is_kallsyms);
+}
+
#define dsos__for_each_with_build_id(pos, head) \
list_for_each_entry(pos, head, node) \
if (!pos->has_build_id) \
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 1ceede45c231..2cf87b7304e2 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -23,6 +23,8 @@ char *build_id_cache__kallsyms_path(const char *sbuild_id, char *bf,

char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
bool is_debug);
+char *__dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
+ bool is_debug, bool is_kallsyms);

int build_id__mark_dso_hit(struct perf_tool *tool, union perf_event *event,
struct perf_sample *sample, struct evsel *evsel,
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 5ddf76fb691c..7e1aac4931e1 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2183,6 +2183,8 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
int err;
const char *kallsyms_filename = NULL;
char *kallsyms_allocated_filename = NULL;
+ char *filename;
+
/*
* Step 1: if the user specified a kallsyms or vmlinux filename, use
* it and only it, reporting errors to the user if it cannot be used.
@@ -2207,6 +2209,18 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
return dso__load_vmlinux(dso, map, symbol_conf.vmlinux_name, false);
}

+ /*
+ * Before checking on common vmlinux locations, check if it's
+ * stored as standard build id binary under .debug tree.
+ */
+ filename = __dso__build_id_filename(dso, NULL, 0, false, false);
+ if (filename != NULL) {
+ err = dso__load_vmlinux(dso, map, filename, true);
+ if (err > 0)
+ return err;
+ free(filename);
+ }
+
if (!symbol_conf.ignore_vmlinux && vmlinux_path != NULL) {
err = dso__load_vmlinux_path(dso, map);
if (err > 0)
--
2.26.2

2020-09-13 21:07:58

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 10/26] perf tools: Enable mmap3 map event when supported

Enabling mmap3 map event when supported and adding
its disable fallback when it fails.

Also adding mmap3 bit to verbose open output:

$ perf record -vv kill
perf_event_attr:
size 120
{ sample_period, sample_freq } 4000
...
bpf_event 1
mmap3 1

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/evsel.c | 9 ++++++++-
tools/perf/util/evsel.h | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 1 +
3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 14baf8542b40..c2cc9b4b30bf 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1065,6 +1065,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
attr->task = track;
attr->mmap = track;
attr->mmap2 = track && !perf_missing_features.mmap2;
+ attr->mmap3 = track && !perf_missing_features.mmap3;
attr->comm = track;
/*
* ksymbol is tracked separately with text poke because it needs to be
@@ -1657,6 +1658,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
evsel->core.attr.bpf_event = 0;
if (perf_missing_features.branch_hw_idx)
evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
+ if (perf_missing_features.mmap3)
+ evsel->core.attr.mmap3 = 0;
retry_sample_id:
if (perf_missing_features.sample_id_all)
evsel->core.attr.sample_id_all = 0;
@@ -1770,7 +1773,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
* Must probe features in the order they were added to the
* perf_event_attr interface.
*/
- if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
+ if (!perf_missing_features.mmap3 && evsel->core.attr.mmap3) {
+ perf_missing_features.mmap3 = true;
+ pr_debug2("switching off mmap3\n");
+ goto fallback_missing_features;
+ } else if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
perf_missing_features.cgroup = true;
pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
goto out_close;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 35e3f6d66085..d49922b22eca 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -119,6 +119,7 @@ struct perf_missing_features {
bool sample_id_all;
bool exclude_guest;
bool mmap2;
+ bool mmap3;
bool cloexec;
bool clockid;
bool clockid_wrong;
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index e67a227c0ce7..3c52c081693d 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -134,6 +134,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(bpf_event, p_unsigned);
PRINT_ATTRf(aux_output, p_unsigned);
PRINT_ATTRf(cgroup, p_unsigned);
+ PRINT_ATTRf(mmap3, p_unsigned);

PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
PRINT_ATTRf(bp_type, p_unsigned);
--
2.26.2

2020-09-13 21:08:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 16/26] perf tools: Synthesize modules with mmap3

Synthesizing modules with mmap3 events so we can
get build id data for module's maps as well.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/synthetic-events.c | 37 +++++++++++++++++++-----------
1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index bd6e7b84283d..6bd2423ce2f3 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -605,7 +605,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
int rc = 0;
struct map *pos;
struct maps *maps = machine__kernel_maps(machine);
- union perf_event *event = zalloc((sizeof(event->mmap) +
+ union perf_event *event = zalloc((sizeof(event->mmap3) +
machine->id_hdr_size));
if (event == NULL) {
pr_debug("Not enough memory synthesizing mmap event "
@@ -613,8 +613,6 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
return -1;
}

- event->header.type = PERF_RECORD_MMAP;
-
/*
* kernel uses 0 for user space maps, see kernel/perf_event.c
* __perf_event_mmap
@@ -631,17 +629,30 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
continue;

size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
- event->mmap.header.type = PERF_RECORD_MMAP;
- event->mmap.header.size = (sizeof(event->mmap) -
- (sizeof(event->mmap.filename) - size));
- memset(event->mmap.filename + size, 0, machine->id_hdr_size);
- event->mmap.header.size += machine->id_hdr_size;
- event->mmap.start = pos->start;
- event->mmap.len = pos->end - pos->start;
- event->mmap.pid = machine->pid;
-
- memcpy(event->mmap.filename, pos->dso->long_name,
+ event->mmap3.header.type = PERF_RECORD_MMAP3;
+ event->mmap3.header.size = (sizeof(event->mmap3) -
+ (sizeof(event->mmap3.filename) - size));
+ memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
+ event->mmap3.header.size += machine->id_hdr_size;
+ event->mmap3.start = pos->start;
+ event->mmap3.len = pos->end - pos->start;
+ event->mmap3.pid = machine->pid;
+
+ memcpy(event->mmap3.filename, pos->dso->long_name,
pos->dso->long_name_len + 1);
+
+ rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
+ BUILD_ID_SIZE);
+ if (rc != BUILD_ID_SIZE) {
+ if (event->mmap3.filename[0] == '/') {
+ pr_debug2("Failed to read build ID for %s\n",
+ event->mmap3.filename);
+ }
+ memset(event->mmap3.buildid, 0x0, sizeof(event->mmap3.buildid));
+ }
+
+ rc = 0;
+
if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
rc = -1;
break;
--
2.26.2

2020-09-13 21:08:51

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 08/26] perf tools: Use struct extra_kernel_map in machine__process_kernel_mmap_event

Using struct extra_kernel_map in machine__process_kernel_mmap_event,
to pass mmap details. This way we can used single function for all 3
mmap versions.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/machine.c | 62 +++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 85587de027a5..2805aedc1062 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1572,32 +1572,25 @@ static bool machine__uses_kcore(struct machine *machine)
}

static bool perf_event__is_extra_kernel_mmap(struct machine *machine,
- union perf_event *event)
+ struct extra_kernel_map *xm)
{
return machine__is(machine, "x86_64") &&
- is_entry_trampoline(event->mmap.filename);
+ is_entry_trampoline(xm->name);
}

static int machine__process_extra_kernel_map(struct machine *machine,
- union perf_event *event)
+ struct extra_kernel_map *xm)
{
struct dso *kernel = machine__kernel_dso(machine);
- struct extra_kernel_map xm = {
- .start = event->mmap.start,
- .end = event->mmap.start + event->mmap.len,
- .pgoff = event->mmap.pgoff,
- };

if (kernel == NULL)
return -1;

- strlcpy(xm.name, event->mmap.filename, KMAP_NAME_LEN);
-
- return machine__create_extra_kernel_map(machine, kernel, &xm);
+ return machine__create_extra_kernel_map(machine, kernel, xm);
}

static int machine__process_kernel_mmap_event(struct machine *machine,
- union perf_event *event)
+ struct extra_kernel_map *xm)
{
struct map *map;
enum dso_space_type dso_space;
@@ -1612,20 +1605,18 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
else
dso_space = DSO_SPACE__KERNEL_GUEST;

- is_kernel_mmap = memcmp(event->mmap.filename,
- machine->mmap_name,
+ is_kernel_mmap = memcmp(xm->name, machine->mmap_name,
strlen(machine->mmap_name) - 1) == 0;
- if (event->mmap.filename[0] == '/' ||
- (!is_kernel_mmap && event->mmap.filename[0] == '[')) {
- map = machine__addnew_module_map(machine, event->mmap.start,
- event->mmap.filename);
+ if (xm->name[0] == '/' ||
+ (!is_kernel_mmap && xm->name[0] == '[')) {
+ map = machine__addnew_module_map(machine, xm->start,
+ xm->name);
if (map == NULL)
goto out_problem;

- map->end = map->start + event->mmap.len;
+ map->end = map->start + xm->end - xm->start;
} else if (is_kernel_mmap) {
- const char *symbol_name = (event->mmap.filename +
- strlen(machine->mmap_name));
+ const char *symbol_name = (xm->name + strlen(machine->mmap_name));
/*
* Should be there already, from the build-id table in
* the header.
@@ -1679,18 +1670,17 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
if (strstr(kernel->long_name, "vmlinux"))
dso__set_short_name(kernel, "[kernel.vmlinux]", false);

- machine__update_kernel_mmap(machine, event->mmap.start,
- event->mmap.start + event->mmap.len);
+ machine__update_kernel_mmap(machine, xm->start, xm->end);

/*
* Avoid using a zero address (kptr_restrict) for the ref reloc
* symbol. Effectively having zero here means that at record
* time /proc/sys/kernel/kptr_restrict was non zero.
*/
- if (event->mmap.pgoff != 0) {
+ if (xm->pgoff != 0) {
map__set_kallsyms_ref_reloc_sym(machine->vmlinux_map,
symbol_name,
- event->mmap.pgoff);
+ xm->pgoff);
}

if (machine__is_default_guest(machine)) {
@@ -1699,8 +1689,8 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
*/
dso__load(kernel, machine__kernel_map(machine));
}
- } else if (perf_event__is_extra_kernel_mmap(machine, event)) {
- return machine__process_extra_kernel_map(machine, event);
+ } else if (perf_event__is_extra_kernel_mmap(machine, xm)) {
+ return machine__process_extra_kernel_map(machine, xm);
}
return 0;
out_problem:
@@ -1726,7 +1716,14 @@ int machine__process_mmap2_event(struct machine *machine,

if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
sample->cpumode == PERF_RECORD_MISC_KERNEL) {
- ret = machine__process_kernel_mmap_event(machine, event);
+ struct extra_kernel_map xm = {
+ .start = event->mmap2.start,
+ .end = event->mmap2.start + event->mmap2.len,
+ .pgoff = event->mmap2.pgoff,
+ };
+
+ strlcpy(xm.name, event->mmap2.filename, KMAP_NAME_LEN);
+ ret = machine__process_kernel_mmap_event(machine, &xm);
if (ret < 0)
goto out_problem;
return 0;
@@ -1776,7 +1773,14 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event

if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
sample->cpumode == PERF_RECORD_MISC_KERNEL) {
- ret = machine__process_kernel_mmap_event(machine, event);
+ struct extra_kernel_map xm = {
+ .start = event->mmap.start,
+ .end = event->mmap.start + event->mmap.len,
+ .pgoff = event->mmap.pgoff,
+ };
+
+ strlcpy(xm.name, event->mmap.filename, KMAP_NAME_LEN);
+ ret = machine__process_kernel_mmap_event(machine, &xm);
if (ret < 0)
goto out_problem;
return 0;
--
2.26.2

2020-09-13 21:08:52

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 18/26] perf tests: Add mmap3 support for perf record test

Adding mmap3 support for perf record test so it can
pass for kernel with mmap3 support.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/tests/perf-record.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c
index 67d3f5aad016..722c0cc02e57 100644
--- a/tools/perf/tests/perf-record.c
+++ b/tools/perf/tests/perf-record.c
@@ -224,6 +224,7 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus
if ((type == PERF_RECORD_COMM ||
type == PERF_RECORD_MMAP ||
type == PERF_RECORD_MMAP2 ||
+ type == PERF_RECORD_MMAP3 ||
type == PERF_RECORD_FORK ||
type == PERF_RECORD_EXIT) &&
(pid_t)event->comm.pid != evlist->workload.pid) {
@@ -233,7 +234,8 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus

if ((type == PERF_RECORD_COMM ||
type == PERF_RECORD_MMAP ||
- type == PERF_RECORD_MMAP2) &&
+ type == PERF_RECORD_MMAP2 ||
+ type == PERF_RECORD_MMAP3) &&
event->comm.pid != event->comm.tid) {
pr_debug("%s with different pid/tid!\n", name);
++errs;
@@ -253,6 +255,9 @@ int test__PERF_RECORD(struct test *test __maybe_unused, int subtest __maybe_unus
goto check_bname;
case PERF_RECORD_MMAP2:
mmap_filename = event->mmap2.filename;
+ goto check_bname;
+ case PERF_RECORD_MMAP3:
+ mmap_filename = event->mmap3.filename;
check_bname:
bname = strrchr(mmap_filename, '/');
if (bname != NULL) {
--
2.26.2

2020-09-13 21:09:01

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 25/26] perf tools: Move debuginfo download code into get_debuginfo

Moving debuginfo download code into get_debuginfo
to align with get_executable function added earlier.
The functionality stays intact apart from some extra
debug output.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 45 ++++++++++++++++++++++++++------------
1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 9335a535e547..ea217bb30626 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -626,6 +626,35 @@ static int build_id_cache__add_sdt_cache(const char *sbuild_id,
#define build_id_cache__add_sdt_cache(sbuild_id, realname, nsi) (0)
#endif

+#ifdef HAVE_DEBUGINFOD_SUPPORT
+static int get_debuginfo(const char *sbuild_id, char **path)
+{
+ debuginfod_client *c;
+ int fd;
+
+ c = debuginfod_begin();
+ if (c == NULL)
+ return -1;
+
+ pr_debug("trying debuginfod for debuginfo <%s> ... ", sbuild_id);
+
+ fd = debuginfod_find_debuginfo(c, (const unsigned char *) sbuild_id,
+ 0, path);
+ if (fd >= 0)
+ close(fd); /* retaining reference by realname */
+
+ debuginfod_end(c);
+ pr_debug("%s%s\n", *path ? "OK " : "FAILED", *path ? *path : "");
+ return *path ? 0 : -1;
+}
+#else
+static int get_debuginfo(const char *sbuild_id __maybe_unused,
+ char **path __maybe_unused)
+{
+ return -1;
+}
+#endif
+
static char *build_id_cache__find_debug(const char *sbuild_id,
struct nsinfo *nsi)
{
@@ -649,20 +678,8 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
zfree(&realname);
nsinfo__mountns_exit(&nsc);

-#ifdef HAVE_DEBUGINFOD_SUPPORT
- if (realname == NULL) {
- debuginfod_client* c = debuginfod_begin();
- if (c != NULL) {
- int fd = debuginfod_find_debuginfo(c,
- (const unsigned char*)sbuild_id, 0,
- &realname);
- if (fd >= 0)
- close(fd); /* retaining reference by realname */
- debuginfod_end(c);
- }
- }
-#endif
-
+ if (realname == NULL)
+ get_debuginfo(sbuild_id, &realname);
out:
free(debugfile);
return realname;
--
2.26.2

2020-09-13 21:09:05

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 19/26] perf tools: Add buildid-list support for mmap3

Add buildid-list support for mmap3 so we can display
hit dso objects buildid with filename for mmap3 data:

$ perf buildid-list
1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-buildid-list.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index e3ef75583514..adcc64478ec1 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -57,6 +57,7 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
.mode = PERF_DATA_MODE_READ,
.force = force,
};
+ bool has_build_id;

symbol__elf_init();
/*
@@ -77,6 +78,15 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
perf_header__has_feat(&session->header, HEADER_AUXTRACE))
with_hits = false;

+ has_build_id = perf_header__has_feat(&session->header, HEADER_BUILD_ID);
+
+ /*
+ * We don't really show non hit dsos, keep that also for mmap3
+ * buildid data, we don't care about non hit dsos anyway.
+ */
+ if (!has_build_id)
+ with_hits = true;
+
/*
* in pipe-mode, the only way to get the buildids is to parse
* the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID
--
2.26.2

2020-09-13 21:09:22

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 24/26] perf tools: Add buildid-list --store option

Adding buildid-list --store option to populate
.debug data with build id files.

$ rm -rf ~/.debug/

$ perf buildid-list
1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so

$ perf buildid-list --store

$ find ~/.debug/
.../.debug/
.../.debug/usr
.../.debug/usr/lib64
.../.debug/usr/lib64/ld-2.31.so
.../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b
.../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/elf
.../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/debug
.../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/probes
.../.debug/usr/lib64/libc-2.31.so
.../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2
.../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/elf
.../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/debug
.../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/probes
.../.debug/.build-id
.../.debug/.build-id/18
.../.debug/.build-id/18/05c738c8f3ec0f47b7ea09080c28f34d18a82b
.../.debug/.build-id/d2
.../.debug/.build-id/d2/78249792061c6b74d1693ca59513be1def13f2

It's possible to query debuginfod daemon for binaries by defining
DEBUGINFOD_URLS variable with server URL, like:

$ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf buildid-list --store
OK 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 .../.debug/.build-id/43/9fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/elf
FAIL 23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
FAIL d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
FAIL 1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
OK ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd
OK 589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
OK 3b9b2ef537520303411ad5038b596d5d18e7c2b8 /usr/lib64/libpcre2-8.so.0.10.0

Increasing debug level in util/probe-event.c to get rid
of the sdt probes messages on single verbose level (-v).

Signed-off-by: Jiri Olsa <[email protected]>
---
.../perf/Documentation/perf-buildid-list.txt | 12 ++
tools/perf/builtin-buildid-list.c | 169 +++++++++++++++++-
tools/perf/util/probe-event.c | 6 +-
3 files changed, 181 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-buildid-list.txt b/tools/perf/Documentation/perf-buildid-list.txt
index 25c52efcc7f0..9bb8948e2e75 100644
--- a/tools/perf/Documentation/perf-buildid-list.txt
+++ b/tools/perf/Documentation/perf-buildid-list.txt
@@ -33,6 +33,18 @@ OPTIONS
-k::
--kernel::
Show running kernel build id.
+
+--store::
+ Store DSOs into .debug cache.
+
+ The option goes through all build ids and try to locate related binary,
+ if found, it's stored in the build id database (~/.debug).
+
+ It's possible to query debuginfod daemon for binaries by defining
+ DEBUGINFOD_URLS variable with server URL, like:
+
+ $ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf buildid-list --store
+
-v::
--verbose::
Be more verbose.
diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index adcc64478ec1..af326e7b5c44 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -17,8 +17,15 @@
#include "util/session.h"
#include "util/symbol.h"
#include "util/data.h"
+#include "util/namespaces.h"
#include <errno.h>
#include <linux/err.h>
+#include <linux/zalloc.h>
+#ifdef HAVE_DEBUGINFOD_SUPPORT
+#include <elfutils/debuginfod.h>
+#endif
+#include <unistd.h>
+#include <sys/stat.h>

static int sysfs__fprintf_build_id(FILE *fp)
{
@@ -49,7 +56,155 @@ static bool dso__skip_buildid(struct dso *dso, int with_hits)
return with_hits && !dso->hit;
}

-static int perf_session__list_build_ids(bool force, bool with_hits)
+#ifdef HAVE_DEBUGINFOD_SUPPORT
+static int get_executable(const char *sbuild_id, char **path)
+{
+ debuginfod_client *c;
+ int fd;
+
+ c = debuginfod_begin();
+ if (c == NULL)
+ return -1;
+
+ pr_debug("trying debuginfod for executable <%s> ... ", sbuild_id);
+
+ fd = debuginfod_find_executable(c, (const unsigned char *) sbuild_id,
+ 0, path);
+ if (fd >= 0)
+ close(fd); /* retaining reference by realname */
+
+ debuginfod_end(c);
+ pr_debug("%s%s\n", *path ? "OK " : "FAILED", *path ? *path : "");
+ return *path ? 0 : -1;
+}
+#else
+static int get_executable(const char *sbuild_id __maybe_unused,
+ char **path __maybe_unused)
+{
+ return -1;
+}
+#endif
+
+struct dso_store_data {
+ bool with_hits;
+};
+
+static int dso__store(struct dso *dso, struct machine *machine __maybe_unused, void *priv)
+{
+ struct dso_store_data *data = priv;
+ char sbuild_id[SBUILD_ID_SIZE];
+ u8 bid[BUILD_ID_SIZE];
+ char *path = NULL;
+ bool is_kallsyms;
+ int err = -1;
+
+ if (!dso->has_build_id ||
+ !build_id__is_defined(dso->build_id))
+ return 0;
+
+ if (data->with_hits && !dso->hit)
+ return 0;
+
+ /*
+ * The storing process is:
+ * - get build id of the dso
+ * - check if it matches provided build id from mmap3 event
+ * - if not, try debuginfod to download the binary
+ * - store binary to build id database
+ */
+ is_kallsyms = !strcmp(machine->mmap_name, dso->short_name);
+ build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
+
+ if (is_kallsyms) {
+ /*
+ * Find out if we are on the same kernel as perf.data
+ * and keel kallsyms in that case.
+ */
+ path = strdup(dso->long_name);
+ if (!path)
+ goto out_err;
+
+ err = sysfs__read_build_id("/sys/kernel/notes", &bid, sizeof(bid));
+ if (err < 0)
+ goto out_err;
+ } else {
+ struct stat st;
+
+ /*
+ * Does the file exists in the first place, if it does,
+ * resolve path and read the build id.
+ */
+ if (stat(dso->long_name, &st)) {
+ zfree(&path);
+ goto try_download;
+ }
+
+ path = nsinfo__realpath(dso->long_name, dso->nsinfo);
+ if (!path)
+ goto out_err;
+
+ err = filename__read_build_id(path, &bid, sizeof(bid));
+ if (err != sizeof(bid))
+ goto out_err;
+ }
+
+ /*
+ * If we match then we want in mmap3 event,
+ * is what we got in the binary, so we're happy.
+ */
+ if (memcmp(&bid, dso->build_id, BUILD_ID_SIZE)) {
+ char sbid[SBUILD_ID_SIZE];
+
+ build_id__sprintf(bid, sizeof(bid), sbid);
+ pr_debug("mmap build id <%s> does not match for %s <%s>\n",
+ sbuild_id, path, sbid);
+ zfree(&path);
+ }
+
+try_download:
+ /*
+ * We did not match build id or did not find the
+ * binary - try debuginfod as last resort.
+ */
+ if (!path) {
+ char *tmp = NULL;
+
+ /*
+ * The debuginfo retrieval is handled within
+ * build_id_cache__add function.
+ */
+ if (get_executable(sbuild_id, &tmp)) {
+ err = -1;
+ goto out_err;
+ }
+
+ path = tmp;
+
+ /*
+ * The kernel dso is now elf binary, so disable is_kallsyms
+ * so build_id_cache__add can prepare proper file names.
+ */
+ is_kallsyms = false;
+ }
+
+ pr_debug("linking %s %s <%s>\n", dso->short_name, path, sbuild_id);
+
+ err = build_id_cache__add(sbuild_id, path, path,
+ dso->nsinfo, is_kallsyms, false);
+out_err:
+ free(path);
+ fprintf(stderr, "%s %s %s\n", err ? "FAIL" : "OK ", sbuild_id, dso->long_name);
+ return 0;
+}
+
+static int perf_session__store(struct perf_session *session, bool with_hits)
+{
+ struct dso_store_data data = { .with_hits = with_hits, };
+
+ return __perf_session__cache_build_ids(session, dso__store, &data);
+}
+
+static int perf_session__list_build_ids(bool force, bool with_hits, bool store)
{
struct perf_session *session;
struct perf_data data = {
@@ -94,7 +249,13 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
if (with_hits || perf_data__is_pipe(&data))
perf_session__process_events(session);

- perf_session__fprintf_dsos_buildid(session, stdout, dso__skip_buildid, with_hits);
+ if (store) {
+ perf_session__store(session, with_hits);
+ } else {
+ perf_session__fprintf_dsos_buildid(session, stdout, dso__skip_buildid,
+ with_hits);
+ }
+
perf_session__delete(session);
out:
return 0;
@@ -105,11 +266,13 @@ int cmd_buildid_list(int argc, const char **argv)
bool show_kernel = false;
bool with_hits = false;
bool force = false;
+ bool store = false;
const struct option options[] = {
OPT_BOOLEAN('H', "with-hits", &with_hits, "Show only DSOs with hits"),
OPT_STRING('i', "input", &input_name, "file", "input file name"),
OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
OPT_BOOLEAN('k', "kernel", &show_kernel, "Show current kernel build id"),
+ OPT_BOOLEAN(0, "store", &store, "Store build id dsos in .debug cache"),
OPT_INCR('v', "verbose", &verbose, "be more verbose"),
OPT_END()
};
@@ -124,5 +287,5 @@ int cmd_buildid_list(int argc, const char **argv)
if (show_kernel)
return !(sysfs__fprintf_build_id(stdout) > 0);

- return perf_session__list_build_ids(force, with_hits);
+ return perf_session__list_build_ids(force, with_hits, store);
}
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 99d36ac77c08..a7d7ebffd005 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1555,9 +1555,9 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
return -EINVAL;
}

- pr_debug("symbol:%s file:%s line:%d offset:%lu return:%d lazy:%s\n",
- pp->function, pp->file, pp->line, pp->offset, pp->retprobe,
- pp->lazy_line);
+ pr_debug2("symbol:%s file:%s line:%d offset:%lu return:%d lazy:%s\n",
+ pp->function, pp->file, pp->line, pp->offset, pp->retprobe,
+ pp->lazy_line);
return 0;
}

--
2.26.2

2020-09-13 21:09:27

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 23/26] perf tools: Add __perf_session__cache_build_ids function

Adding __perf_session__cache_build_ids function as an
interface for caching sessions build ids with callback
function and its data pointer.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 10 ++++++++--
tools/perf/util/build-id.h | 3 +++
2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 22968504c6de..9335a535e547 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -902,7 +902,8 @@ machines__for_each_dso(struct machines *machines, machine__dso_t fn, void *priv)
return ret ? -1 : 0;
}

-int perf_session__cache_build_ids(struct perf_session *session)
+int __perf_session__cache_build_ids(struct perf_session *session,
+ machine__dso_t fn, void *priv)
{
if (no_buildid_cache)
return 0;
@@ -910,7 +911,12 @@ int perf_session__cache_build_ids(struct perf_session *session)
if (mkdir(buildid_dir, 0755) != 0 && errno != EEXIST)
return -1;

- return machines__for_each_dso(&session->machines, dso__cache_build_id, NULL) ? -1 : 0;
+ return machines__for_each_dso(&session->machines, fn, priv) ? -1 : 0;
+}
+
+int perf_session__cache_build_ids(struct perf_session *session)
+{
+ return __perf_session__cache_build_ids(session, dso__cache_build_id, NULL);
}

static bool machine__read_build_ids(struct machine *machine, bool with_hits)
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 6d1c7180047b..ec128e8f7dd3 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -5,6 +5,7 @@
#define BUILD_ID_SIZE 20
#define SBUILD_ID_SIZE (BUILD_ID_SIZE * 2 + 1)

+#include "machine.h"
#include "tool.h"
#include <linux/types.h>

@@ -36,6 +37,8 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits);
int perf_session__write_buildid_table(struct perf_session *session,
struct feat_fd *fd);
int perf_session__cache_build_ids(struct perf_session *session);
+int __perf_session__cache_build_ids(struct perf_session *session,
+ machine__dso_t fn, void *priv);

char *build_id_cache__origname(const char *sbuild_id);
char *build_id_cache__linkname(const char *sbuild_id, char *bf, size_t size);
--
2.26.2

2020-09-13 21:09:37

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 06/26] perf tools: Add support to read build id from compressed elf

Adding support to decompress file before reading build id.

Adding filename__read_build_id and change its current
versions to read_build_id.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/symbol-elf.c | 37 ++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 94a156df22d5..6770572620f3 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -534,7 +534,7 @@ static int elf_read_build_id(Elf *elf, void *bf, size_t size)

#ifdef HAVE_LIBBFD_BUILDID_SUPPORT

-int filename__read_build_id(const char *filename, void *bf, size_t size)
+static int read_build_id(const char *filename, void *bf, size_t size)
{
int err = -1;
bfd *abfd;
@@ -562,7 +562,7 @@ int filename__read_build_id(const char *filename, void *bf, size_t size)

#else // HAVE_LIBBFD_BUILDID_SUPPORT

-int filename__read_build_id(const char *filename, void *bf, size_t size)
+static int read_build_id(const char *filename, void *bf, size_t size)
{
int fd, err = -1;
Elf *elf;
@@ -591,6 +591,39 @@ int filename__read_build_id(const char *filename, void *bf, size_t size)

#endif // HAVE_LIBBFD_BUILDID_SUPPORT

+int filename__read_build_id(const char *filename, void *bf, size_t size)
+{
+ struct kmod_path m = { .name = NULL, };
+ char path[PATH_MAX];
+ int err;
+
+ if (!filename)
+ return -EFAULT;
+
+ err = kmod_path__parse(&m, filename);
+ if (err)
+ return -1;
+
+ if (m.comp) {
+ int error = 0, fd;
+
+ fd = filename__decompress(filename, path, sizeof(path), m.comp, &error);
+ if (fd < 0) {
+ pr_debug("Failed to decompress (error %d) %s\n",
+ error, filename);
+ return -1;
+ }
+ close(fd);
+ filename = path;
+ }
+
+ err = read_build_id(filename, bf, size);
+
+ if (m.comp)
+ unlink(filename);
+ return err;
+}
+
int sysfs__read_build_id(const char *filename, void *build_id, size_t size)
{
int fd, err = -1;
--
2.26.2

2020-09-13 21:09:42

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 20/26] perf tools: Add build_id_cache__add function

Adding build_id_cache__add function as core function
that adds file into build id database. It will be
sed from another callers in following changes.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/build-id.c | 42 ++++++++++++++++++++++++--------------
tools/perf/util/build-id.h | 2 ++
2 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index b281c97894e0..bf044e52ad1f 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -668,24 +668,15 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
return realname;
}

-int build_id_cache__add_s(const char *sbuild_id, const char *name,
- struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
+int
+build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
+ struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
{
const size_t size = PATH_MAX;
- char *realname = NULL, *filename = NULL, *dir_name = NULL,
- *linkname = zalloc(size), *tmp;
+ char *filename = NULL, *dir_name = NULL, *linkname = zalloc(size), *tmp;
char *debugfile = NULL;
int err = -1;

- if (!is_kallsyms) {
- if (!is_vdso)
- realname = nsinfo__realpath(name, nsi);
- else
- realname = realpath(name, NULL);
- if (!realname)
- goto out_free;
- }
-
dir_name = build_id_cache__cachedir(sbuild_id, name, nsi, is_kallsyms,
is_vdso);
if (!dir_name)
@@ -786,8 +777,6 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
pr_debug4("Failed to update/scan SDT cache for %s\n", realname);

out_free:
- if (!is_kallsyms)
- free(realname);
free(filename);
free(debugfile);
free(dir_name);
@@ -795,6 +784,29 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
return err;
}

+int build_id_cache__add_s(const char *sbuild_id, const char *name,
+ struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
+{
+ char *realname = NULL;
+ int err = -1;
+
+ if (!is_kallsyms) {
+ if (!is_vdso)
+ realname = nsinfo__realpath(name, nsi);
+ else
+ realname = realpath(name, NULL);
+ if (!realname)
+ goto out_free;
+ }
+
+ err = build_id_cache__add(sbuild_id, name, realname, nsi, is_kallsyms, is_vdso);
+
+out_free:
+ if (!is_kallsyms)
+ free(realname);
+ return err;
+}
+
static int build_id_cache__add_b(const u8 *build_id, size_t build_id_size,
const char *name, struct nsinfo *nsi,
bool is_kallsyms, bool is_vdso)
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index 2cf87b7304e2..6d1c7180047b 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -50,6 +50,8 @@ char *build_id_cache__complement(const char *incomplete_sbuild_id);
int build_id_cache__list_build_ids(const char *pathname, struct nsinfo *nsi,
struct strlist **result);
bool build_id_cache__cached(const char *sbuild_id);
+int build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
+ struct nsinfo *nsi, bool is_kallsyms, bool is_vdso);
int build_id_cache__add_s(const char *sbuild_id,
const char *name, struct nsinfo *nsi,
bool is_kallsyms, bool is_vdso);
--
2.26.2

2020-09-13 21:09:43

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 15/26] perf tools: Synthesize proc tasks with mmap3

Synthesizing proc tasks with mmap3 events so we can
get build id data for synthesized maps as well.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/mmap.c | 2 +-
tools/perf/util/synthetic-events.c | 72 +++++++++++++++++-------------
2 files changed, 43 insertions(+), 31 deletions(-)

diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index ab7108d22428..51f6f86580a9 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -33,7 +33,7 @@ void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)

len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
buf[len] = '\0';
- pr_debug("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
+ pr_debug2("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
}

size_t mmap__mmap_len(struct mmap *map)
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 89b390623b63..bd6e7b84283d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -379,7 +379,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
}
io__init(&io, io.fd, bf, sizeof(bf));

- event->header.type = PERF_RECORD_MMAP2;
+ event->header.type = PERF_RECORD_MMAP3;
t = rdclock();

while (!io.eof) {
@@ -387,20 +387,20 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
size_t size;

/* ensure null termination since stack will be reused. */
- event->mmap2.filename[0] = '\0';
+ event->mmap3.filename[0] = '\0';

/* 00400000-0040c000 r-xp 00000000 fd:01 41038 /bin/cat */
if (!read_proc_maps_line(&io,
- &event->mmap2.start,
- &event->mmap2.len,
- &event->mmap2.prot,
- &event->mmap2.flags,
- &event->mmap2.pgoff,
- &event->mmap2.maj,
- &event->mmap2.min,
- &event->mmap2.ino,
- sizeof(event->mmap2.filename),
- event->mmap2.filename))
+ &event->mmap3.start,
+ &event->mmap3.len,
+ &event->mmap3.prot,
+ &event->mmap3.flags,
+ &event->mmap3.pgoff,
+ &event->mmap3.maj,
+ &event->mmap3.min,
+ &event->mmap3.ino,
+ sizeof(event->mmap3.filename),
+ event->mmap3.filename))
continue;

if ((rdclock() - t) > timeout) {
@@ -412,7 +412,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
goto out;
}

- event->mmap2.ino_generation = 0;
+ event->mmap3.ino_generation = 0;

/*
* Just like the kernel, see __perf_event_mmap in kernel/perf_event.c
@@ -422,8 +422,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
else
event->header.misc = PERF_RECORD_MISC_GUEST_USER;

- if ((event->mmap2.prot & PROT_EXEC) == 0) {
- if (!mmap_data || (event->mmap2.prot & PROT_READ) == 0)
+ if ((event->mmap3.prot & PROT_EXEC) == 0) {
+ if (!mmap_data || (event->mmap3.prot & PROT_READ) == 0)
continue;

event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
@@ -433,25 +433,37 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
if (truncation)
event->header.misc |= PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT;

- if (!strcmp(event->mmap2.filename, ""))
- strcpy(event->mmap2.filename, anonstr);
+ if (!strcmp(event->mmap3.filename, ""))
+ strcpy(event->mmap3.filename, anonstr);

if (hugetlbfs_mnt_len &&
- !strncmp(event->mmap2.filename, hugetlbfs_mnt,
+ !strncmp(event->mmap3.filename, hugetlbfs_mnt,
hugetlbfs_mnt_len)) {
- strcpy(event->mmap2.filename, anonstr);
- event->mmap2.flags |= MAP_HUGETLB;
+ strcpy(event->mmap3.filename, anonstr);
+ event->mmap3.flags |= MAP_HUGETLB;
}

- size = strlen(event->mmap2.filename) + 1;
+ size = strlen(event->mmap3.filename) + 1;
size = PERF_ALIGN(size, sizeof(u64));
- event->mmap2.len -= event->mmap.start;
- event->mmap2.header.size = (sizeof(event->mmap2) -
- (sizeof(event->mmap2.filename) - size));
- memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
- event->mmap2.header.size += machine->id_hdr_size;
- event->mmap2.pid = tgid;
- event->mmap2.tid = pid;
+ event->mmap3.len -= event->mmap.start;
+ event->mmap3.header.size = (sizeof(event->mmap3) -
+ (sizeof(event->mmap3.filename) - size));
+ memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
+ event->mmap3.header.size += machine->id_hdr_size;
+ event->mmap3.pid = tgid;
+ event->mmap3.tid = pid;
+
+ rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
+ BUILD_ID_SIZE);
+ if (rc != BUILD_ID_SIZE) {
+ if (event->mmap3.filename[0] == '/') {
+ pr_debug2("Failed to read build ID for %s\n",
+ event->mmap3.filename);
+ }
+ memset(event->mmap3.buildid, 0x0, sizeof(event->mmap3.buildid));
+ }
+
+ rc = 0;

if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
rc = -1;
@@ -744,7 +756,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
if (comm_event == NULL)
goto out;

- mmap_event = malloc(sizeof(mmap_event->mmap2) + machine->id_hdr_size);
+ mmap_event = malloc(sizeof(mmap_event->mmap3) + machine->id_hdr_size);
if (mmap_event == NULL)
goto out_free_comm;

@@ -826,7 +838,7 @@ static int __perf_event__synthesize_threads(struct perf_tool *tool,
if (comm_event == NULL)
goto out;

- mmap_event = malloc(sizeof(mmap_event->mmap2) + machine->id_hdr_size);
+ mmap_event = malloc(sizeof(mmap_event->mmap3) + machine->id_hdr_size);
if (mmap_event == NULL)
goto out_free_comm;

--
2.26.2

2020-09-13 21:11:36

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 21/26] perf tools: Add machine__for_each_dso function

Adding machine__for_each_dso to iterate over all dso
objects defined for the within the machine. It will
be used in following changes.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/util/machine.c | 12 ++++++++++++
tools/perf/util/machine.h | 4 ++++
2 files changed, 16 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 863d949ef967..f8e8d0d80847 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -3181,3 +3181,15 @@ char *machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, ch
*addrp = map->unmap_ip(map, sym->start);
return sym->name;
}
+
+int machine__for_each_dso(struct machine *machine, machine__dso_t fn, void *priv)
+{
+ struct dso *pos;
+ int err = 0;
+
+ list_for_each_entry(pos, &machine->dsos.head, node) {
+ if (fn(pos, machine, priv))
+ err = -1;
+ }
+ return err;
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index a3c1d0bf89e5..504c707f22bb 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -252,6 +252,10 @@ void machines__destroy_kernel_maps(struct machines *machines);

size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);

+typedef int (*machine__dso_t)(struct dso *dso, struct machine *machine, void *priv);
+
+int machine__for_each_dso(struct machine *machine, machine__dso_t fn,
+ void *priv);
int machine__for_each_thread(struct machine *machine,
int (*fn)(struct thread *thread, void *p),
void *priv);
--
2.26.2

2020-09-13 21:11:48

by Jiri Olsa

[permalink] [raw]
Subject: [PATCH 26/26] perf tools: Add report --store option

Adding report --store option as a wrapper for 'buildid-list --store'
to save some typing.

Signed-off-by: Jiri Olsa <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 3 +++
tools/perf/builtin-report.c | 17 +++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index d068103690cc..698fe90d6e1d 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -548,6 +548,9 @@ include::itrace.txt[]
Configure time quantum for time sort key. Default 100ms.
Accepts s, us, ms, ns units.

+--store::
+ Store build id DSOs in .debug cache. See `--store` option n perf-buildid-list.
+
--total-cycles::
When --total-cycles is specified, it supports sorting for all blocks by
'Sampled Cycles%'. This is useful to concentrate on the globally hottest
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3dd37513eb94..3450e441d894 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1098,6 +1098,18 @@ static int process_attr(struct perf_tool *tool __maybe_unused,
return 0;
}

+static int build_id_store(const char *file)
+{
+ const char *argv[4];
+
+ argv[0] = "buildid-list";
+ argv[1] = "-i";
+ argv[2] = file;
+ argv[3] = "--store";
+
+ return cmd_buildid_list(4, argv);
+}
+
int cmd_report(int argc, const char **argv)
{
struct perf_session *session;
@@ -1107,6 +1119,7 @@ int cmd_report(int argc, const char **argv)
int branch_mode = -1;
int last_key = 0;
bool branch_call_mode = false;
+ bool store = false;
#define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent"
static const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n"
CALLCHAIN_REPORT_HELP
@@ -1301,6 +1314,7 @@ int cmd_report(int argc, const char **argv)
OPTS_EVSWITCH(&report.evswitch),
OPT_BOOLEAN(0, "total-cycles", &report.total_cycles_mode,
"Sort all blocks by 'Sampled Cycles%'"),
+ OPT_BOOLEAN(0, "store", &store, "Store build id dsos in .debug cache"),
OPT_END()
};
struct perf_data data = {
@@ -1367,6 +1381,9 @@ int cmd_report(int argc, const char **argv)
input_name = "perf.data";
}

+ if (store)
+ return build_id_store(input_name);
+
data.path = input_name;
data.force = symbol_conf.force;

--
2.26.2

2020-09-14 05:28:57

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC 00/26] perf: Add mmap3 support

Hi Jiri,

On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
>
> hi,
> while playing with perf daemon support I realized I need
> the build id data in mmap events, so we don't need to care
> about removed/updated binaries during long perf runs.
>
> This RFC patchset adds new mmap3 events that copies mmap2
> event and adds build id in it. It makes mmap3 the default
> mmap event for synthesizing kernel/modules/tasks and adds
> some tooling enhancements to enable the workflow below.

Cool! It's nice that we can skip the final build-id collection stage
with this while data size will be bigger.

Thanks
Namhyung

2020-09-14 05:40:38

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
>
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
>
> struct {
> struct perf_event_header header;
>
> u32 pid, tid;
> u64 addr;
> u64 len;
> u64 pgoff;
> u32 maj;
> u32 min;
> u64 ino;
> u64 ino_generation;
> u32 prot, flags;
> u32 reserved;
> u8 buildid[20];

Do we need maj, min, ino, ino_generation for mmap3 event?
I think they are to compare binaries, then we can do it with
build-id (and I think it'd be better)..


> char filename[];
> struct sample_id sample_id;
> };
>
> Adding 4 bytes reserved field to align buildid data to 8 bytes,
> so sample_id data is properly aligned.
>
> The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> struct. When set for an event, it enables the build id retrieval
> and will use mmap3 format for the event.
>
> Keeping track of mmap3 events and calling build_id_parse
> in perf_event_mmap_event only if we have any defined.
>
> Having build id attached directly to the mmap event will help
> tool like perf to skip final search through perf data for
> binaries that are needed in the report time. Also it prevents
> possible race when the binary could be removed or replaced
> during profiling.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> 2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 077e7ee69e3d..facfc3c673ed 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -384,7 +384,8 @@ struct perf_event_attr {
> aux_output : 1, /* generate AUX records instead of events */
> cgroup : 1, /* include cgroup events */
> text_poke : 1, /* include text poke events */
> - __reserved_1 : 30;
> + mmap3 : 1, /* include bpf events */

???

> + __reserved_1 : 29;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -1060,6 +1061,30 @@ enum perf_event_type {
> */
> PERF_RECORD_TEXT_POKE = 20,
>
> + /*
> + * The MMAP3 records are an augmented version of MMAP2, they add
> + * build id value to identify the exact binary behind map
> + *
> + * struct {
> + * struct perf_event_header header;
> + *
> + * u32 pid, tid;
> + * u64 addr;
> + * u64 len;
> + * u64 pgoff;
> + * u32 maj;
> + * u32 min;
> + * u64 ino;
> + * u64 ino_generation;
> + * u32 prot, flags;
> + * u32 reserved;
> + * u8 buildid[20];
> + * char filename[];
> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_MMAP3 = 21,
> +
> PERF_RECORD_MAX, /* non-ABI */
> };
>
[SNIP]
> @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> mmap_event->prot = prot;
> mmap_event->flags = flags;
>
> + if (atomic_read(&nr_mmap3_events))
> + build_id_parse(vma, mmap_event->buildid);

What about if it failed? We should zero out the build-id..

Thanks
Namhyung

> +
> if (!(vma->vm_flags & VM_EXEC))
> mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
>
> @@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
> /* .ino_generation (attr_mmap2 only) */
> /* .prot (attr_mmap2 only) */
> /* .flags (attr_mmap2 only) */
> + /* .buildid (attr_mmap3 only) */
> };
>
> perf_addr_filters_adjust(vma);
> @@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
> inc = true;
> if (event->attr.mmap || event->attr.mmap_data)
> atomic_inc(&nr_mmap_events);
> + if (event->attr.mmap3)
> + atomic_inc(&nr_mmap3_events);
> if (event->attr.comm)
> atomic_inc(&nr_comm_events);
> if (event->attr.namespaces)
> --
> 2.26.2
>

2020-09-14 05:42:46

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 03/26] tools headers uapi: Sync tools/include/uapi/linux/perf_event.h

On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
>
> Sync uapi header with kernel version for mmap3 support.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index 3e5dcdd48a49..84a0cbdab1ef 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -384,7 +384,8 @@ struct perf_event_attr {
> aux_output : 1, /* generate AUX records instead of events */
> cgroup : 1, /* include cgroup events */
> text_poke : 1, /* include text poke events */
> - __reserved_1 : 30;
> + mmap3 : 1, /* include bpf events */

Same here..

Thanks
Namhyung


> + __reserved_1 : 29;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -1060,6 +1061,30 @@ enum perf_event_type {
> */
> PERF_RECORD_TEXT_POKE = 20,
>
> + /*
> + * The MMAP3 records are an augmented version of MMAP2, they add
> + * build id value to identify the exact binary behind map
> + *
> + * struct {
> + * struct perf_event_header header;
> + *
> + * u32 pid, tid;
> + * u64 addr;
> + * u64 len;
> + * u64 pgoff;
> + * u32 maj;
> + * u32 min;
> + * u64 ino;
> + * u64 ino_generation;
> + * u32 prot, flags;
> + * u32 reserved;
> + * u8 buildid[20];
> + * char filename[];
> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_MMAP3 = 21,
> +
> PERF_RECORD_MAX, /* non-ABI */
> };
>
> --
> 2.26.2
>

2020-09-14 05:48:14

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 05/26] perf tools: Add build_id__is_defined function

On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
>
> Adding build_id__is_defined helper to check build id
> is defined and is != zero build id.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/build-id.c | 11 +++++++++++
> tools/perf/util/build-id.h | 1 +
> 2 files changed, 12 insertions(+)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index 31207b6e2066..bdee4e08e60d 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -902,3 +902,14 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
>
> return ret;
> }
> +
> +bool build_id__is_defined(const u8 *build_id)
> +{
> + static u8 zero[BUILD_ID_SIZE];
> + int err = 0;
> +
> + if (build_id)
> + err = memcmp(build_id, &zero, BUILD_ID_SIZE);
> +
> + return err ? true : false;
> +}

I think this is a bit confusing.. How about this?

bool ret = false;
if (build_id)
ret = memcmp(...);
return ret;

Or, it can be a oneliner..

Thanks
Namhyung


> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
> index aad419bb165c..1ceede45c231 100644
> --- a/tools/perf/util/build-id.h
> +++ b/tools/perf/util/build-id.h
> @@ -14,6 +14,7 @@ extern struct perf_tool build_id__mark_dso_hit_ops;
> struct dso;
> struct feat_fd;
>
> +bool build_id__is_defined(const u8 *build_id);
> int build_id__sprintf(const u8 *build_id, int len, char *bf);
> int sysfs__sprintf_build_id(const char *root_dir, char *sbuild_id);
> int filename__sprintf_build_id(const char *pathname, char *sbuild_id);
> --
> 2.26.2
>

2020-09-14 05:56:22

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 07/26] perf tools: Add check for existing link in buildid dir

On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
>
> When adding new build id link we fail if the link is already
> there. Adding check for existing link and warn/replace the
> link with new target.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/build-id.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index bdee4e08e60d..ecdc167aa1a0 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -751,8 +751,26 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
> tmp = dir_name + strlen(buildid_dir) - 5;
> memcpy(tmp, "../..", 5);
>
> - if (symlink(tmp, linkname) == 0)
> + if (symlink(tmp, linkname) == 0) {
> err = 0;
> + } else if (errno == EEXIST) {
> + char path[PATH_MAX];
> +
> + if (readlink(linkname, path, sizeof(path)) == -1) {
> + pr_err("Cant read link: %s\n", linkname);

typo

> + goto out_free;
> + }
> + if (strcmp(tmp, path)) {
> + pr_err("Inconsistent .debug record, updating [%s]\n",
> + linkname);

But isn't it ok to copy a binary to another location?
There can be multiple binaries with the same build-id..

Thanks
Namhyung


> +
> + unlink(linkname);
> +
> + if (symlink(tmp, linkname))
> + goto out_free;
> + }
> + err = 0;
> + }
>
> /* Update SDT cache : error is just warned */
> if (realname &&
> --
> 2.26.2
>

2020-09-14 06:13:44

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH 01/26] bpf: Move stack_map_get_build_id into lib

On Sun, Sep 13, 2020 at 2:05 PM Jiri Olsa <[email protected]> wrote:
>
> Moving stack_map_get_build_id into lib with
> prototype in linux/buildid.h header:
>
> int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);
>
> This function returns build id for given struct vm_area_struct.
> There is no functional change to stack_map_get_build_id function.
>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Song Liu <[email protected]>
> Signed-off-by: Jiri Olsa <[email protected]>

Acked-by: Song Liu <[email protected]>

2020-09-14 06:22:20

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Sun, Sep 13, 2020 at 10:40 PM Namhyung Kim <[email protected]> wrote:
>
> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
> >
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> >
> > struct {
> > struct perf_event_header header;
> >
> > u32 pid, tid;
> > u64 addr;
> > u64 len;
> > u64 pgoff;
> > u32 maj;
> > u32 min;
> > u64 ino;
> > u64 ino_generation;
> > u32 prot, flags;
> > u32 reserved;

I guess we need reserved _after_ buildid, no?

> > u8 buildid[20];
>
> Do we need maj, min, ino, ino_generation for mmap3 event?
> I think they are to compare binaries, then we can do it with
> build-id (and I think it'd be better)..

+1 we shouldn't need maj, min, etc.

Thanks,
Song

[...]

2020-09-14 06:27:07

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/26] perf tools: Try load vmlinux from buildid database

On Mon, Sep 14, 2020 at 6:04 AM Jiri Olsa <[email protected]> wrote:
>
> Currently we don't check on kernel's vmlinux the same way as
> we do for normal binaries, but we either look for kallsyms
> file in build id database or check on known vmlinux locations
> (plus some other optional paths).
>
> This patch adds the check for standard build id binary location,
> so we are ready once we start to store it there from debuginfod
> in following changes.

But dso__load_vmlinux_path() already has the logic.
Also you should check symbol_conf.ignore_vmlinux_buildid.

Thanks
Namhyung


>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/build-id.c | 13 ++++++++++---
> tools/perf/util/build-id.h | 2 ++
> tools/perf/util/symbol.c | 14 ++++++++++++++
> 3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index ecdc167aa1a0..6165f9d1d941 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -259,10 +259,9 @@ static const char *build_id_cache__basename(bool is_kallsyms, bool is_vdso,
> "debug" : "elf"));
> }
>
> -char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
> - bool is_debug)
> +char *__dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
> + bool is_debug, bool is_kallsyms)
> {
> - bool is_kallsyms = dso__is_kallsyms((struct dso *)dso);
> bool is_vdso = dso__is_vdso((struct dso *)dso);
> char sbuild_id[SBUILD_ID_SIZE];
> char *linkname;
> @@ -291,6 +290,14 @@ char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
> return bf;
> }
>
> +char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
> + bool is_debug)
> +{
> + bool is_kallsyms = dso__is_kallsyms((struct dso *)dso);
> +
> + return __dso__build_id_filename(dso, bf, size, is_debug, is_kallsyms);
> +}
> +
> #define dsos__for_each_with_build_id(pos, head) \
> list_for_each_entry(pos, head, node) \
> if (!pos->has_build_id) \
> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
> index 1ceede45c231..2cf87b7304e2 100644
> --- a/tools/perf/util/build-id.h
> +++ b/tools/perf/util/build-id.h
> @@ -23,6 +23,8 @@ char *build_id_cache__kallsyms_path(const char *sbuild_id, char *bf,
>
> char *dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
> bool is_debug);
> +char *__dso__build_id_filename(const struct dso *dso, char *bf, size_t size,
> + bool is_debug, bool is_kallsyms);
>
> int build_id__mark_dso_hit(struct perf_tool *tool, union perf_event *event,
> struct perf_sample *sample, struct evsel *evsel,
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 5ddf76fb691c..7e1aac4931e1 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -2183,6 +2183,8 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
> int err;
> const char *kallsyms_filename = NULL;
> char *kallsyms_allocated_filename = NULL;
> + char *filename;
> +
> /*
> * Step 1: if the user specified a kallsyms or vmlinux filename, use
> * it and only it, reporting errors to the user if it cannot be used.
> @@ -2207,6 +2209,18 @@ static int dso__load_kernel_sym(struct dso *dso, struct map *map)
> return dso__load_vmlinux(dso, map, symbol_conf.vmlinux_name, false);
> }
>
> + /*
> + * Before checking on common vmlinux locations, check if it's
> + * stored as standard build id binary under .debug tree.
> + */
> + filename = __dso__build_id_filename(dso, NULL, 0, false, false);
> + if (filename != NULL) {
> + err = dso__load_vmlinux(dso, map, filename, true);
> + if (err > 0)
> + return err;
> + free(filename);
> + }
> +
> if (!symbol_conf.ignore_vmlinux && vmlinux_path != NULL) {
> err = dso__load_vmlinux_path(dso, map);
> if (err > 0)
> --
> 2.26.2
>

2020-09-14 06:32:25

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 14/26] perf tools: Add mmap3 events to --show-mmap-events option

On Mon, Sep 14, 2020 at 6:04 AM Jiri Olsa <[email protected]> wrote:
>
> Displaying mmap3 events for --show-mmap-events option,
> the build id is displayed within <> braces:
>
> $ perf script --show-mmap-events
> kill 12148 13893.519014: PERF_RECORD_MMAP3 12148/12148: <43938d0803c5e3130ea679cd569aaf44b98d9ae8> [0x560e7d7f..
> kill 12148 13893.519420: PERF_RECORD_MMAP3 12148/12148: <1805c738c8f3ec0f47b7ea09080c28f34d18a82b> [0x7f9e7dfc..
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/builtin-script.c | 33 +++++++++++++++++++++++++++++++++
> 1 file changed, 33 insertions(+)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index d839983cfb88..9c09581d5cb0 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2342,6 +2342,38 @@ static int process_mmap2_event(struct perf_tool *tool,
> event->mmap2.tid);
> }
>
> +static int process_mmap3_event(struct perf_tool *tool,
> + union perf_event *event,
> + struct perf_sample *sample,
> + struct machine *machine)
> +{
> + struct thread *thread;
> + struct perf_script *script = container_of(tool, struct perf_script, tool);
> + struct perf_session *session = script->session;
> + struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
> +
> + if (perf_event__process_mmap3(tool, event, sample, machine) < 0)
> + return -1;
> +
> + thread = machine__findnew_thread(machine, event->mmap3.pid, event->mmap3.tid);
> + if (thread == NULL) {
> + pr_debug("problem processing MMAP2 event, skipping it.\n");

MMAP3 ?

Thanks
Namhyung


> + return -1;
> + }
> +
> + if (!evsel->core.attr.sample_id_all) {
> + sample->cpu = 0;
> + sample->time = 0;
> + sample->tid = event->mmap3.tid;
> + sample->pid = event->mmap3.pid;
> + }
> + perf_sample__fprintf_start(script, sample, thread, evsel,
> + PERF_RECORD_MMAP3, stdout);
> + perf_event__fprintf(event, machine, stdout);
> + thread__put(thread);
> + return 0;
> +}
> +
> static int process_switch_event(struct perf_tool *tool,
> union perf_event *event,
> struct perf_sample *sample,
> @@ -2498,6 +2530,7 @@ static int __cmd_script(struct perf_script *script)
> if (script->show_mmap_events) {
> script->tool.mmap = process_mmap_event;
> script->tool.mmap2 = process_mmap2_event;
> + script->tool.mmap3 = process_mmap3_event;
> }
> if (script->show_switch_events || (scripting_ops && scripting_ops->process_switch))
> script->tool.context_switch = process_switch_event;
> --
> 2.26.2
>

2020-09-14 06:42:44

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
>
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
>
> struct {
> struct perf_event_header header;
>
> u32 pid, tid;
> u64 addr;
> u64 len;
> u64 pgoff;
> u32 maj;
> u32 min;
> u64 ino;
> u64 ino_generation;
> u32 prot, flags;
> u32 reserved;
> u8 buildid[20];
> char filename[];
> struct sample_id sample_id;
> };
>
> Adding 4 bytes reserved field to align buildid data to 8 bytes,
> so sample_id data is properly aligned.
>
> The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> struct. When set for an event, it enables the build id retrieval
> and will use mmap3 format for the event.
>
> Keeping track of mmap3 events and calling build_id_parse
> in perf_event_mmap_event only if we have any defined.
>
> Having build id attached directly to the mmap event will help
> tool like perf to skip final search through perf data for
> binaries that are needed in the report time. Also it prevents
> possible race when the binary could be removed or replaced
> during profiling.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> 2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 077e7ee69e3d..facfc3c673ed 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -384,7 +384,8 @@ struct perf_event_attr {
> aux_output : 1, /* generate AUX records instead of events */
> cgroup : 1, /* include cgroup events */
> text_poke : 1, /* include text poke events */
> - __reserved_1 : 30;
> + mmap3 : 1, /* include bpf events */
> + __reserved_1 : 29;
>
what happens if I set mmap3 and mmap2?

I think using mmap3 for every mmap may be overkill as you add useless
20 bytes to an mmap record.
I am not sure if your code handles the case where mmap3 is not needed
because there is no buildid, e.g, anonymous memory.
It seems to me you've written the patch in such a way that if the user
tool supports mmap3, then it supersedes mmap2, and thus
you need all the fields of mmap2. But if could be more interesting to
return either MMAP2 or MMAP3 depending on tool support
and type of mmap, that would certainly save 20 bytes on any anon mmap.
But maybe that logic is already in your patch and I missed it.


> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -1060,6 +1061,30 @@ enum perf_event_type {
> */
> PERF_RECORD_TEXT_POKE = 20,
>
> + /*
> + * The MMAP3 records are an augmented version of MMAP2, they add
> + * build id value to identify the exact binary behind map
> + *
> + * struct {
> + * struct perf_event_header header;
> + *
> + * u32 pid, tid;
> + * u64 addr;
> + * u64 len;
> + * u64 pgoff;
> + * u32 maj;
> + * u32 min;
> + * u64 ino;
> + * u64 ino_generation;
> + * u32 prot, flags;
> + * u32 reserved;
> + * u8 buildid[20];
> + * char filename[];
> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_MMAP3 = 21,
> +
> PERF_RECORD_MAX, /* non-ABI */
> };
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7ed5248f0445..719894492dac 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -51,6 +51,7 @@
> #include <linux/proc_ns.h>
> #include <linux/mount.h>
> #include <linux/min_heap.h>
> +#include <linux/buildid.h>
>
> #include "internal.h"
>
> @@ -386,6 +387,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
> static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
>
> static atomic_t nr_mmap_events __read_mostly;
> +static atomic_t nr_mmap3_events __read_mostly;
> static atomic_t nr_comm_events __read_mostly;
> static atomic_t nr_namespaces_events __read_mostly;
> static atomic_t nr_task_events __read_mostly;
> @@ -4588,7 +4590,7 @@ static bool is_sb_event(struct perf_event *event)
> return false;
>
> if (attr->mmap || attr->mmap_data || attr->mmap2 ||
> - attr->comm || attr->comm_exec ||
> + attr->mmap3 || attr->comm || attr->comm_exec ||
> attr->task || attr->ksymbol ||
> attr->context_switch || attr->text_poke ||
> attr->bpf_event)
> @@ -4644,6 +4646,8 @@ static void unaccount_event(struct perf_event *event)
> dec = true;
> if (event->attr.mmap || event->attr.mmap_data)
> atomic_dec(&nr_mmap_events);
> + if (event->attr.mmap3)
> + atomic_dec(&nr_mmap3_events);
> if (event->attr.comm)
> atomic_dec(&nr_comm_events);
> if (event->attr.namespaces)
> @@ -7465,7 +7469,7 @@ static void perf_pmu_output_stop(struct perf_event *event)
> /*
> * task tracking -- fork/exit
> *
> - * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap_data | attr.task
> + * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap3 | attr.mmap_data | attr.task
> */
>
> struct perf_task_event {
> @@ -7486,8 +7490,8 @@ struct perf_task_event {
> static int perf_event_task_match(struct perf_event *event)
> {
> return event->attr.comm || event->attr.mmap ||
> - event->attr.mmap2 || event->attr.mmap_data ||
> - event->attr.task;
> + event->attr.mmap2 || event->attr.mmap3 ||
> + event->attr.mmap_data || event->attr.task;
> }
>
> static void perf_event_task_output(struct perf_event *event,
> @@ -7913,6 +7917,7 @@ struct perf_mmap_event {
> u64 ino;
> u64 ino_generation;
> u32 prot, flags;
> + u8 buildid[BUILD_ID_SIZE];
>
> struct {
> struct perf_event_header header;
> @@ -7933,7 +7938,7 @@ static int perf_event_mmap_match(struct perf_event *event,
> int executable = vma->vm_flags & VM_EXEC;
>
> return (!executable && event->attr.mmap_data) ||
> - (executable && (event->attr.mmap || event->attr.mmap2));
> + (executable && (event->attr.mmap || event->attr.mmap2 || event->attr.mmap3));
> }
>
> static void perf_event_mmap_output(struct perf_event *event,
> @@ -7949,7 +7954,7 @@ static void perf_event_mmap_output(struct perf_event *event,
> if (!perf_event_mmap_match(event, data))
> return;
>
> - if (event->attr.mmap2) {
> + if (event->attr.mmap2 || event->attr.mmap3) {
> mmap_event->event_id.header.type = PERF_RECORD_MMAP2;
> mmap_event->event_id.header.size += sizeof(mmap_event->maj);
> mmap_event->event_id.header.size += sizeof(mmap_event->min);
> @@ -7959,6 +7964,12 @@ static void perf_event_mmap_output(struct perf_event *event,
> mmap_event->event_id.header.size += sizeof(mmap_event->flags);
> }
>
> + if (event->attr.mmap3) {
> + mmap_event->event_id.header.type = PERF_RECORD_MMAP3;
> + mmap_event->event_id.header.size += sizeof(u32);
> + mmap_event->event_id.header.size += sizeof(mmap_event->buildid);
> + }
> +
> perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
> ret = perf_output_begin(&handle, event,
> mmap_event->event_id.header.size);
> @@ -7970,7 +7981,7 @@ static void perf_event_mmap_output(struct perf_event *event,
>
> perf_output_put(&handle, mmap_event->event_id);
>
> - if (event->attr.mmap2) {
> + if (event->attr.mmap2 || event->attr.mmap3) {
> perf_output_put(&handle, mmap_event->maj);
> perf_output_put(&handle, mmap_event->min);
> perf_output_put(&handle, mmap_event->ino);
> @@ -7979,6 +7990,13 @@ static void perf_event_mmap_output(struct perf_event *event,
> perf_output_put(&handle, mmap_event->flags);
> }
>
> + if (event->attr.mmap3) {
> + u32 reserved = 0;
> +
> + perf_output_put(&handle, reserved);
> + __output_copy(&handle, mmap_event->buildid, BUILD_ID_SIZE);
> + }
> +
> __output_copy(&handle, mmap_event->file_name,
> mmap_event->file_size);
>
> @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> mmap_event->prot = prot;
> mmap_event->flags = flags;
>
> + if (atomic_read(&nr_mmap3_events))
> + build_id_parse(vma, mmap_event->buildid);
> +
> if (!(vma->vm_flags & VM_EXEC))
> mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
>
> @@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
> /* .ino_generation (attr_mmap2 only) */
> /* .prot (attr_mmap2 only) */
> /* .flags (attr_mmap2 only) */
> + /* .buildid (attr_mmap3 only) */
> };
>
> perf_addr_filters_adjust(vma);
> @@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
> inc = true;
> if (event->attr.mmap || event->attr.mmap_data)
> atomic_inc(&nr_mmap_events);
> + if (event->attr.mmap3)
> + atomic_inc(&nr_mmap3_events);
> if (event->attr.comm)
> atomic_inc(&nr_comm_events);
> if (event->attr.namespaces)
> --
> 2.26.2
>

2020-09-14 06:44:45

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 24/26] perf tools: Add buildid-list --store option

On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
>
> Adding buildid-list --store option to populate
> .debug data with build id files.

Hmm.. isn't it better to add it to the buildid-cache command?

>
> $ rm -rf ~/.debug/
>
> $ perf buildid-list
> 1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
> d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so
>
> $ perf buildid-list --store
>
> $ find ~/.debug/
> .../.debug/
> .../.debug/usr
> .../.debug/usr/lib64
> .../.debug/usr/lib64/ld-2.31.so
> .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b
> .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/elf
> .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/debug
> .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/probes
> .../.debug/usr/lib64/libc-2.31.so
> .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2
> .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/elf
> .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/debug
> .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/probes
> .../.debug/.build-id
> .../.debug/.build-id/18
> .../.debug/.build-id/18/05c738c8f3ec0f47b7ea09080c28f34d18a82b
> .../.debug/.build-id/d2
> .../.debug/.build-id/d2/78249792061c6b74d1693ca59513be1def13f2
>
> It's possible to query debuginfod daemon for binaries by defining
> DEBUGINFOD_URLS variable with server URL, like:
>
> $ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf buildid-list --store
> OK 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 .../.debug/.build-id/43/9fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/elf
> FAIL 23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
> FAIL d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
> FAIL 1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
> OK ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd
> OK 589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
> OK 3b9b2ef537520303411ad5038b596d5d18e7c2b8 /usr/lib64/libpcre2-8.so.0.10.0
>
> Increasing debug level in util/probe-event.c to get rid
> of the sdt probes messages on single verbose level (-v).
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
[SNIP]
> +static int dso__store(struct dso *dso, struct machine *machine __maybe_unused, void *priv)
> +{
> + struct dso_store_data *data = priv;
> + char sbuild_id[SBUILD_ID_SIZE];
> + u8 bid[BUILD_ID_SIZE];
> + char *path = NULL;
> + bool is_kallsyms;
> + int err = -1;
> +
> + if (!dso->has_build_id ||
> + !build_id__is_defined(dso->build_id))
> + return 0;
> +
> + if (data->with_hits && !dso->hit)
> + return 0;
> +
> + /*
> + * The storing process is:
> + * - get build id of the dso
> + * - check if it matches provided build id from mmap3 event
> + * - if not, try debuginfod to download the binary
> + * - store binary to build id database
> + */
> + is_kallsyms = !strcmp(machine->mmap_name, dso->short_name);
> + build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> +
> + if (is_kallsyms) {
> + /*
> + * Find out if we are on the same kernel as perf.data
> + * and keel kallsyms in that case.
> + */
> + path = strdup(dso->long_name);
> + if (!path)
> + goto out_err;
> +
> + err = sysfs__read_build_id("/sys/kernel/notes", &bid, sizeof(bid));
> + if (err < 0)
> + goto out_err;
> + } else {
> + struct stat st;
> +
> + /*
> + * Does the file exists in the first place, if it does,
> + * resolve path and read the build id.
> + */
> + if (stat(dso->long_name, &st)) {
> + zfree(&path);
> + goto try_download;
> + }
> +
> + path = nsinfo__realpath(dso->long_name, dso->nsinfo);
> + if (!path)
> + goto out_err;
> +
> + err = filename__read_build_id(path, &bid, sizeof(bid));

Is it ok to read the file out of the namespace?

Thanks
Namhyung


> + if (err != sizeof(bid))
> + goto out_err;
> + }
> +
> + /*
> + * If we match then we want in mmap3 event,
> + * is what we got in the binary, so we're happy.
> + */
> + if (memcmp(&bid, dso->build_id, BUILD_ID_SIZE)) {
> + char sbid[SBUILD_ID_SIZE];
> +
> + build_id__sprintf(bid, sizeof(bid), sbid);
> + pr_debug("mmap build id <%s> does not match for %s <%s>\n",
> + sbuild_id, path, sbid);
> + zfree(&path);
> + }
> +
> +try_download:
> + /*
> + * We did not match build id or did not find the
> + * binary - try debuginfod as last resort.
> + */
> + if (!path) {
> + char *tmp = NULL;
> +
> + /*
> + * The debuginfo retrieval is handled within
> + * build_id_cache__add function.
> + */
> + if (get_executable(sbuild_id, &tmp)) {
> + err = -1;
> + goto out_err;
> + }
> +
> + path = tmp;
> +
> + /*
> + * The kernel dso is now elf binary, so disable is_kallsyms
> + * so build_id_cache__add can prepare proper file names.
> + */
> + is_kallsyms = false;
> + }
> +
> + pr_debug("linking %s %s <%s>\n", dso->short_name, path, sbuild_id);
> +
> + err = build_id_cache__add(sbuild_id, path, path,
> + dso->nsinfo, is_kallsyms, false);
> +out_err:
> + free(path);
> + fprintf(stderr, "%s %s %s\n", err ? "FAIL" : "OK ", sbuild_id, dso->long_name);
> + return 0;
> +}

2020-09-14 09:09:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> what happens if I set mmap3 and mmap2?
>
> I think using mmap3 for every mmap may be overkill as you add useless
> 20 bytes to an mmap record.
> I am not sure if your code handles the case where mmap3 is not needed
> because there is no buildid, e.g, anonymous memory.
> It seems to me you've written the patch in such a way that if the user
> tool supports mmap3, then it supersedes mmap2, and thus
> you need all the fields of mmap2. But if could be more interesting to
> return either MMAP2 or MMAP3 depending on tool support
> and type of mmap, that would certainly save 20 bytes on any anon mmap.
> But maybe that logic is already in your patch and I missed it.

That, and what if you don't want any of that buildid nonsense at all? I
always kill that because it makes perf pointlessly slow and has
absolutely no upsides for me.

2020-09-14 09:39:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Sun, Sep 13, 2020 at 11:02:49PM +0200, Jiri Olsa wrote:
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
>
> struct {
> struct perf_event_header header;
>
> u32 pid, tid;
> u64 addr;
> u64 len;
> u64 pgoff;
> u32 maj;
> u32 min;
> u64 ino;
> u64 ino_generation;
> u32 prot, flags;
> u32 reserved;
> u8 buildid[20];
> char filename[];
> struct sample_id sample_id;
> };
>

So weren't there still open problems with mmap2 that also needed
addressing? I seem to have forgotten :/

2020-09-14 15:16:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 24/26] perf tools: Add buildid-list --store option

Em Mon, Sep 14, 2020 at 03:42:55PM +0900, Namhyung Kim escreveu:
> On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding buildid-list --store option to populate
> > .debug data with build id files.
>
> Hmm.. isn't it better to add it to the buildid-cache command?

Yes, that is the right place. 'buildid-list' is about perf.data files,
buildid-cache is about .debug cache.

- Arnaldo

> >
> > $ rm -rf ~/.debug/
> >
> > $ perf buildid-list
> > 1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
> > d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so
> >
> > $ perf buildid-list --store
> >
> > $ find ~/.debug/
> > .../.debug/
> > .../.debug/usr
> > .../.debug/usr/lib64
> > .../.debug/usr/lib64/ld-2.31.so
> > .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b
> > .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/elf
> > .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/debug
> > .../.debug/usr/lib64/ld-2.31.so/1805c738c8f3ec0f47b7ea09080c28f34d18a82b/probes
> > .../.debug/usr/lib64/libc-2.31.so
> > .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2
> > .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/elf
> > .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/debug
> > .../.debug/usr/lib64/libc-2.31.so/d278249792061c6b74d1693ca59513be1def13f2/probes
> > .../.debug/.build-id
> > .../.debug/.build-id/18
> > .../.debug/.build-id/18/05c738c8f3ec0f47b7ea09080c28f34d18a82b
> > .../.debug/.build-id/d2
> > .../.debug/.build-id/d2/78249792061c6b74d1693ca59513be1def13f2
> >
> > It's possible to query debuginfod daemon for binaries by defining
> > DEBUGINFOD_URLS variable with server URL, like:
> >
> > $ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf buildid-list --store
> > OK 439fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46 .../.debug/.build-id/43/9fe9bdeaed66af2bb8b8de5e650d5ecc3d8d46/elf
> > FAIL 23b87f5b0560481043257e82be670bc97786a171 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko.xz
> > FAIL d2b3be372bcdd4ebc15e479d2ff803657de0fd1e /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/drivers/block/virtio_blk.ko.xz
> > FAIL 1466a71bcd0ff5c975ee79b72752137c0143d225 /lib/modules/5.9.0-0.rc3.1.mmap3.fc34.x86_64/kernel/fs/xfs/xfs.ko.xz
> > OK ad60d10b38c93bd8738d5aa594e240f01bb328cd /usr/lib/systemd/systemd
> > OK 589e403a34f55486bcac848a45e00bcdeedd1ca8 /usr/lib64/libcrypto.so.1.1.1g
> > OK 3b9b2ef537520303411ad5038b596d5d18e7c2b8 /usr/lib64/libpcre2-8.so.0.10.0
> >
> > Increasing debug level in util/probe-event.c to get rid
> > of the sdt probes messages on single verbose level (-v).
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> [SNIP]
> > +static int dso__store(struct dso *dso, struct machine *machine __maybe_unused, void *priv)
> > +{
> > + struct dso_store_data *data = priv;
> > + char sbuild_id[SBUILD_ID_SIZE];
> > + u8 bid[BUILD_ID_SIZE];
> > + char *path = NULL;
> > + bool is_kallsyms;
> > + int err = -1;
> > +
> > + if (!dso->has_build_id ||
> > + !build_id__is_defined(dso->build_id))
> > + return 0;
> > +
> > + if (data->with_hits && !dso->hit)
> > + return 0;
> > +
> > + /*
> > + * The storing process is:
> > + * - get build id of the dso
> > + * - check if it matches provided build id from mmap3 event
> > + * - if not, try debuginfod to download the binary
> > + * - store binary to build id database
> > + */
> > + is_kallsyms = !strcmp(machine->mmap_name, dso->short_name);
> > + build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> > +
> > + if (is_kallsyms) {
> > + /*
> > + * Find out if we are on the same kernel as perf.data
> > + * and keel kallsyms in that case.
> > + */
> > + path = strdup(dso->long_name);
> > + if (!path)
> > + goto out_err;
> > +
> > + err = sysfs__read_build_id("/sys/kernel/notes", &bid, sizeof(bid));
> > + if (err < 0)
> > + goto out_err;
> > + } else {
> > + struct stat st;
> > +
> > + /*
> > + * Does the file exists in the first place, if it does,
> > + * resolve path and read the build id.
> > + */
> > + if (stat(dso->long_name, &st)) {
> > + zfree(&path);
> > + goto try_download;
> > + }
> > +
> > + path = nsinfo__realpath(dso->long_name, dso->nsinfo);
> > + if (!path)
> > + goto out_err;
> > +
> > + err = filename__read_build_id(path, &bid, sizeof(bid));
>
> Is it ok to read the file out of the namespace?
>
> Thanks
> Namhyung
>
>
> > + if (err != sizeof(bid))
> > + goto out_err;
> > + }
> > +
> > + /*
> > + * If we match then we want in mmap3 event,
> > + * is what we got in the binary, so we're happy.
> > + */
> > + if (memcmp(&bid, dso->build_id, BUILD_ID_SIZE)) {
> > + char sbid[SBUILD_ID_SIZE];
> > +
> > + build_id__sprintf(bid, sizeof(bid), sbid);
> > + pr_debug("mmap build id <%s> does not match for %s <%s>\n",
> > + sbuild_id, path, sbid);
> > + zfree(&path);
> > + }
> > +
> > +try_download:
> > + /*
> > + * We did not match build id or did not find the
> > + * binary - try debuginfod as last resort.
> > + */
> > + if (!path) {
> > + char *tmp = NULL;
> > +
> > + /*
> > + * The debuginfo retrieval is handled within
> > + * build_id_cache__add function.
> > + */
> > + if (get_executable(sbuild_id, &tmp)) {
> > + err = -1;
> > + goto out_err;
> > + }
> > +
> > + path = tmp;
> > +
> > + /*
> > + * The kernel dso is now elf binary, so disable is_kallsyms
> > + * so build_id_cache__add can prepare proper file names.
> > + */
> > + is_kallsyms = false;
> > + }
> > +
> > + pr_debug("linking %s %s <%s>\n", dso->short_name, path, sbuild_id);
> > +
> > + err = build_id_cache__add(sbuild_id, path, path,
> > + dso->nsinfo, is_kallsyms, false);
> > +out_err:
> > + free(path);
> > + fprintf(stderr, "%s %s %s\n", err ? "FAIL" : "OK ", sbuild_id, dso->long_name);
> > + return 0;
> > +}

--

- Arnaldo

2020-09-14 15:31:06

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:

> > struct {
> > struct perf_event_header header;

> > u32 pid, tid;
> > u64 addr;
> > u64 len;
> > u64 pgoff;
> > u32 maj;
> > u32 min;
> > u64 ino;
> > u64 ino_generation;
> > u32 prot, flags;
> > u32 reserved;

What for this reserved? its all nicely aligned already, u64 followed by
two u32 (prot, flags).

> > u8 buildid[20];

> Do we need maj, min, ino, ino_generation for mmap3 event?
> I think they are to compare binaries, then we can do it with
> build-id (and I think it'd be better)..

Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
superset of MMAP2.

If we want to ditch useless stuff, then trow away pid, tid too, as we
can select those via sample_type.

Having said that, at this point I don't even know if adding new
PERF_RECORD_ that are an update for a preexisting one is the right way
to proceed.

Perhaps we should attach a BPF program to point where a mmap/munmap is
being done (perf_event_mmap()) and allow userspace to ask for whatever
it wants? With a kprobes there right now we can implement this MMAP3
easily, no?

Start with a kprobes and all this would be already available in kernels
with BPF, no need to reboot with a PERF_RECORD_MMAP3 enabled kernel,
when we get a tracepoint there, then use it, as its more efficient.

sample_id stuff would be done as with other records, etc, just the
things that are MMAP3 specific would be in the payload, perf.data has
the struct layout description, etc.

Then use a BPF_TRACE_ITER to generate preexisting MMAP records instead
of going thru /proc/ doing tons of syscalls, instead injecting directly
into the perf ring buffer the MMAP3 (or MMAP2 or MMAP or something else
according to the tools needs).

- Arnaldo

>
> > char filename[];
> > struct sample_id sample_id;
> > };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct. When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> > kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> > 2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output : 1, /* generate AUX records instead of events */
> > cgroup : 1, /* include cgroup events */
> > text_poke : 1, /* include text poke events */
> > - __reserved_1 : 30;
> > + mmap3 : 1, /* include bpf events */
>
> ???
>
> > + __reserved_1 : 29;
> >
> > union {
> > __u32 wakeup_events; /* wakeup every n events */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> > */
> > PERF_RECORD_TEXT_POKE = 20,
> >
> > + /*
> > + * The MMAP3 records are an augmented version of MMAP2, they add
> > + * build id value to identify the exact binary behind map
> > + *
> > + * struct {
> > + * struct perf_event_header header;
> > + *
> > + * u32 pid, tid;
> > + * u64 addr;
> > + * u64 len;
> > + * u64 pgoff;
> > + * u32 maj;
> > + * u32 min;
> > + * u64 ino;
> > + * u64 ino_generation;
> > + * u32 prot, flags;
> > + * u32 reserved;
> > + * u8 buildid[20];
> > + * char filename[];
> > + * struct sample_id sample_id;
> > + * };
> > + */
> > + PERF_RECORD_MMAP3 = 21,
> > +
> > PERF_RECORD_MAX, /* non-ABI */
> > };
> >
> [SNIP]
> > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> > mmap_event->prot = prot;
> > mmap_event->flags = flags;
> >
> > + if (atomic_read(&nr_mmap3_events))
> > + build_id_parse(vma, mmap_event->buildid);
>
> What about if it failed? We should zero out the build-id..
>
> Thanks
> Namhyung
>
> > +
> > if (!(vma->vm_flags & VM_EXEC))
> > mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
> >
> > @@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
> > /* .ino_generation (attr_mmap2 only) */
> > /* .prot (attr_mmap2 only) */
> > /* .flags (attr_mmap2 only) */
> > + /* .buildid (attr_mmap3 only) */
> > };
> >
> > perf_addr_filters_adjust(vma);
> > @@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
> > inc = true;
> > if (event->attr.mmap || event->attr.mmap_data)
> > atomic_inc(&nr_mmap_events);
> > + if (event->attr.mmap3)
> > + atomic_inc(&nr_mmap3_events);
> > if (event->attr.comm)
> > atomic_inc(&nr_comm_events);
> > if (event->attr.namespaces)
> > --
> > 2.26.2
> >

--

- Arnaldo

2020-09-14 15:35:36

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Em Mon, Sep 14, 2020 at 11:08:11AM +0200, [email protected] escreveu:
> On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> > what happens if I set mmap3 and mmap2?
> >
> > I think using mmap3 for every mmap may be overkill as you add useless
> > 20 bytes to an mmap record.
> > I am not sure if your code handles the case where mmap3 is not needed
> > because there is no buildid, e.g, anonymous memory.
> > It seems to me you've written the patch in such a way that if the user
> > tool supports mmap3, then it supersedes mmap2, and thus
> > you need all the fields of mmap2. But if could be more interesting to
> > return either MMAP2 or MMAP3 depending on tool support
> > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > But maybe that logic is already in your patch and I missed it.
>
> That, and what if you don't want any of that buildid nonsense at all? I
> always kill that because it makes perf pointlessly slow and has
> absolutely no upsides for me.

So, for you nothing should change, no MMAP3 used, no collection at the
end (which is your pet peeve).

I'm not saying this is what is in his patches right now, but what I
think his patches should be doing.

- Arnaldo

2020-09-14 15:36:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Em Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian escreveu:
> On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> >
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> >
> > struct {
> > struct perf_event_header header;
> >
> > u32 pid, tid;
> > u64 addr;
> > u64 len;
> > u64 pgoff;
> > u32 maj;
> > u32 min;
> > u64 ino;
> > u64 ino_generation;
> > u32 prot, flags;
> > u32 reserved;
> > u8 buildid[20];
> > char filename[];
> > struct sample_id sample_id;
> > };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct. When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> > kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> > 2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output : 1, /* generate AUX records instead of events */
> > cgroup : 1, /* include cgroup events */
> > text_poke : 1, /* include text poke events */
> > - __reserved_1 : 30;
> > + mmap3 : 1, /* include bpf events */
> > + __reserved_1 : 29;
> >
> what happens if I set mmap3 and mmap2?
>
> I think using mmap3 for every mmap may be overkill as you add useless
> 20 bytes to an mmap record.

So use just PERF_RECORD_MMAP2.

I think if the user says: I need buildids, then, in kernels with support
for getting the buildid in MMAP records, use it as its more accurate,
otherwise fall back to traversing all records at the end to go over lots
of files haversting those build-ids.

If the user says I don't want build-ids, nothing changes, no collection
at the end, perf continues using PERF_RECORD_MMAP2.

> I am not sure if your code handles the case where mmap3 is not needed
> because there is no buildid, e.g, anonymous memory.
> It seems to me you've written the patch in such a way that if the user
> tool supports mmap3, then it supersedes mmap2, and thus
> you need all the fields of mmap2. But if could be more interesting to
> return either MMAP2 or MMAP3 depending on tool support
> and type of mmap, that would certainly save 20 bytes on any anon mmap.
> But maybe that logic is already in your patch and I missed it.

Right, it should take into account if the user asked for build-ids or
not in addition to checking if the kernel supports MMAP3.

- Arnaldo

>
> > union {
> > __u32 wakeup_events; /* wakeup every n events */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> > */
> > PERF_RECORD_TEXT_POKE = 20,
> >
> > + /*
> > + * The MMAP3 records are an augmented version of MMAP2, they add
> > + * build id value to identify the exact binary behind map
> > + *
> > + * struct {
> > + * struct perf_event_header header;
> > + *
> > + * u32 pid, tid;
> > + * u64 addr;
> > + * u64 len;
> > + * u64 pgoff;
> > + * u32 maj;
> > + * u32 min;
> > + * u64 ino;
> > + * u64 ino_generation;
> > + * u32 prot, flags;
> > + * u32 reserved;
> > + * u8 buildid[20];
> > + * char filename[];
> > + * struct sample_id sample_id;
> > + * };
> > + */
> > + PERF_RECORD_MMAP3 = 21,
> > +
> > PERF_RECORD_MAX, /* non-ABI */
> > };
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 7ed5248f0445..719894492dac 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -51,6 +51,7 @@
> > #include <linux/proc_ns.h>
> > #include <linux/mount.h>
> > #include <linux/min_heap.h>
> > +#include <linux/buildid.h>
> >
> > #include "internal.h"
> >
> > @@ -386,6 +387,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
> > static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
> >
> > static atomic_t nr_mmap_events __read_mostly;
> > +static atomic_t nr_mmap3_events __read_mostly;
> > static atomic_t nr_comm_events __read_mostly;
> > static atomic_t nr_namespaces_events __read_mostly;
> > static atomic_t nr_task_events __read_mostly;
> > @@ -4588,7 +4590,7 @@ static bool is_sb_event(struct perf_event *event)
> > return false;
> >
> > if (attr->mmap || attr->mmap_data || attr->mmap2 ||
> > - attr->comm || attr->comm_exec ||
> > + attr->mmap3 || attr->comm || attr->comm_exec ||
> > attr->task || attr->ksymbol ||
> > attr->context_switch || attr->text_poke ||
> > attr->bpf_event)
> > @@ -4644,6 +4646,8 @@ static void unaccount_event(struct perf_event *event)
> > dec = true;
> > if (event->attr.mmap || event->attr.mmap_data)
> > atomic_dec(&nr_mmap_events);
> > + if (event->attr.mmap3)
> > + atomic_dec(&nr_mmap3_events);
> > if (event->attr.comm)
> > atomic_dec(&nr_comm_events);
> > if (event->attr.namespaces)
> > @@ -7465,7 +7469,7 @@ static void perf_pmu_output_stop(struct perf_event *event)
> > /*
> > * task tracking -- fork/exit
> > *
> > - * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap_data | attr.task
> > + * enabled by: attr.comm | attr.mmap | attr.mmap2 | attr.mmap3 | attr.mmap_data | attr.task
> > */
> >
> > struct perf_task_event {
> > @@ -7486,8 +7490,8 @@ struct perf_task_event {
> > static int perf_event_task_match(struct perf_event *event)
> > {
> > return event->attr.comm || event->attr.mmap ||
> > - event->attr.mmap2 || event->attr.mmap_data ||
> > - event->attr.task;
> > + event->attr.mmap2 || event->attr.mmap3 ||
> > + event->attr.mmap_data || event->attr.task;
> > }
> >
> > static void perf_event_task_output(struct perf_event *event,
> > @@ -7913,6 +7917,7 @@ struct perf_mmap_event {
> > u64 ino;
> > u64 ino_generation;
> > u32 prot, flags;
> > + u8 buildid[BUILD_ID_SIZE];
> >
> > struct {
> > struct perf_event_header header;
> > @@ -7933,7 +7938,7 @@ static int perf_event_mmap_match(struct perf_event *event,
> > int executable = vma->vm_flags & VM_EXEC;
> >
> > return (!executable && event->attr.mmap_data) ||
> > - (executable && (event->attr.mmap || event->attr.mmap2));
> > + (executable && (event->attr.mmap || event->attr.mmap2 || event->attr.mmap3));
> > }
> >
> > static void perf_event_mmap_output(struct perf_event *event,
> > @@ -7949,7 +7954,7 @@ static void perf_event_mmap_output(struct perf_event *event,
> > if (!perf_event_mmap_match(event, data))
> > return;
> >
> > - if (event->attr.mmap2) {
> > + if (event->attr.mmap2 || event->attr.mmap3) {
> > mmap_event->event_id.header.type = PERF_RECORD_MMAP2;
> > mmap_event->event_id.header.size += sizeof(mmap_event->maj);
> > mmap_event->event_id.header.size += sizeof(mmap_event->min);
> > @@ -7959,6 +7964,12 @@ static void perf_event_mmap_output(struct perf_event *event,
> > mmap_event->event_id.header.size += sizeof(mmap_event->flags);
> > }
> >
> > + if (event->attr.mmap3) {
> > + mmap_event->event_id.header.type = PERF_RECORD_MMAP3;
> > + mmap_event->event_id.header.size += sizeof(u32);
> > + mmap_event->event_id.header.size += sizeof(mmap_event->buildid);
> > + }
> > +
> > perf_event_header__init_id(&mmap_event->event_id.header, &sample, event);
> > ret = perf_output_begin(&handle, event,
> > mmap_event->event_id.header.size);
> > @@ -7970,7 +7981,7 @@ static void perf_event_mmap_output(struct perf_event *event,
> >
> > perf_output_put(&handle, mmap_event->event_id);
> >
> > - if (event->attr.mmap2) {
> > + if (event->attr.mmap2 || event->attr.mmap3) {
> > perf_output_put(&handle, mmap_event->maj);
> > perf_output_put(&handle, mmap_event->min);
> > perf_output_put(&handle, mmap_event->ino);
> > @@ -7979,6 +7990,13 @@ static void perf_event_mmap_output(struct perf_event *event,
> > perf_output_put(&handle, mmap_event->flags);
> > }
> >
> > + if (event->attr.mmap3) {
> > + u32 reserved = 0;
> > +
> > + perf_output_put(&handle, reserved);
> > + __output_copy(&handle, mmap_event->buildid, BUILD_ID_SIZE);
> > + }
> > +
> > __output_copy(&handle, mmap_event->file_name,
> > mmap_event->file_size);
> >
> > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> > mmap_event->prot = prot;
> > mmap_event->flags = flags;
> >
> > + if (atomic_read(&nr_mmap3_events))
> > + build_id_parse(vma, mmap_event->buildid);
> > +
> > if (!(vma->vm_flags & VM_EXEC))
> > mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
> >
> > @@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
> > /* .ino_generation (attr_mmap2 only) */
> > /* .prot (attr_mmap2 only) */
> > /* .flags (attr_mmap2 only) */
> > + /* .buildid (attr_mmap3 only) */
> > };
> >
> > perf_addr_filters_adjust(vma);
> > @@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
> > inc = true;
> > if (event->attr.mmap || event->attr.mmap_data)
> > atomic_inc(&nr_mmap_events);
> > + if (event->attr.mmap3)
> > + atomic_inc(&nr_mmap3_events);
> > if (event->attr.comm)
> > atomic_inc(&nr_comm_events);
> > if (event->attr.namespaces)
> > --
> > 2.26.2
> >

--

- Arnaldo

2020-09-14 15:40:01

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 04/26] perf tools: Add filename__decompress function

Em Sun, Sep 13, 2020 at 11:02:51PM +0200, Jiri Olsa escreveu:
> Factor filename__decompress from decompress_kmodule function.
> It can decompress files with compressions supported in perf -
> xz and gz, the support needs to be compiled in.
>
> It will to be used in following changes to get build id out of
> compressed elf objects.

This is prep work, can be applied now, done.

- Arnaldo

> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/dso.c | 31 +++++++++++++++++++------------
> tools/perf/util/dso.h | 2 ++
> 2 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index 5a3b4755f0b3..0faa96ca7a04 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -279,18 +279,12 @@ bool dso__needs_decompress(struct dso *dso)
> dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
> }
>
> -static int decompress_kmodule(struct dso *dso, const char *name,
> - char *pathname, size_t len)
> +int filename__decompress(const char *name, char *pathname,
> + size_t len, int comp, int *err)
> {
> char tmpbuf[] = KMOD_DECOMP_NAME;
> int fd = -1;
>
> - if (!dso__needs_decompress(dso))
> - return -1;
> -
> - if (dso->comp == COMP_ID__NONE)
> - return -1;
> -
> /*
> * We have proper compression id for DSO and yet the file
> * behind the 'name' can still be plain uncompressed object.
> @@ -304,17 +298,17 @@ static int decompress_kmodule(struct dso *dso, const char *name,
> * To keep this transparent, we detect this and return the file
> * descriptor to the uncompressed file.
> */
> - if (!compressions[dso->comp].is_compressed(name))
> + if (!compressions[comp].is_compressed(name))
> return open(name, O_RDONLY);
>
> fd = mkstemp(tmpbuf);
> if (fd < 0) {
> - dso->load_errno = errno;
> + *err = errno;
> return -1;
> }
>
> - if (compressions[dso->comp].decompress(name, fd)) {
> - dso->load_errno = DSO_LOAD_ERRNO__DECOMPRESSION_FAILURE;
> + if (compressions[comp].decompress(name, fd)) {
> + *err = DSO_LOAD_ERRNO__DECOMPRESSION_FAILURE;
> close(fd);
> fd = -1;
> }
> @@ -328,6 +322,19 @@ static int decompress_kmodule(struct dso *dso, const char *name,
> return fd;
> }
>
> +static int decompress_kmodule(struct dso *dso, const char *name,
> + char *pathname, size_t len)
> +{
> + if (!dso__needs_decompress(dso))
> + return -1;
> +
> + if (dso->comp == COMP_ID__NONE)
> + return -1;
> +
> + return filename__decompress(name, pathname, len, dso->comp,
> + &dso->load_errno);
> +}
> +
> int dso__decompress_kmodule_fd(struct dso *dso, const char *name)
> {
> return decompress_kmodule(dso, name, NULL, 0);
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index 8ad17f395a19..f1efd2e11547 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -274,6 +274,8 @@ bool dso__needs_decompress(struct dso *dso);
> int dso__decompress_kmodule_fd(struct dso *dso, const char *name);
> int dso__decompress_kmodule_path(struct dso *dso, const char *name,
> char *pathname, size_t len);
> +int filename__decompress(const char *name, char *pathname,
> + size_t len, int comp, int *err);
>
> #define KMOD_DECOMP_NAME "/tmp/perf-kmod-XXXXXX"
> #define KMOD_DECOMP_LEN sizeof(KMOD_DECOMP_NAME)
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:10:55

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 05/26] perf tools: Add build_id__is_defined function

Em Mon, Sep 14, 2020 at 02:44:35PM +0900, Namhyung Kim escreveu:
> On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding build_id__is_defined helper to check build id
> > is defined and is != zero build id.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/build-id.c | 11 +++++++++++
> > tools/perf/util/build-id.h | 1 +
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index 31207b6e2066..bdee4e08e60d 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -902,3 +902,14 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
> >
> > return ret;
> > }
> > +
> > +bool build_id__is_defined(const u8 *build_id)
> > +{
> > + static u8 zero[BUILD_ID_SIZE];
> > + int err = 0;
> > +
> > + if (build_id)
> > + err = memcmp(build_id, &zero, BUILD_ID_SIZE);
> > +
> > + return err ? true : false;
> > +}

> I think this is a bit confusing.. How about this?

> bool ret = false;
> if (build_id)
> ret = memcmp(...);
> return ret;

> Or, it can be a oneliner..

Yeah.

I was curious about if the kernel lib has something to ask if a range of
memory is zeroed, and there is this:

static bool is_zeroed(void *from, size_t size)
{
return memchr_inv(from, 0x0, size) == NULL;
}

commit 798248206b59acc6e1238c778281419c041891a7
Author: Akinobu Mita <[email protected]>
Date: Mon Oct 31 17:08:07 2011 -0700

lib/string.c: introduce memchr_inv()

memchr_inv() is mainly used to check whether the whole buffer is filled
with just a specified byte.

The function name and prototype are stolen from logfs and the
implementation is from SLUB.

---

Some usage in drivers/nvme/target/admin-cmd.c

+static void nvmet_execute_identify_desclist(struct nvmet_req *req)
+{
+ struct nvmet_ns *ns;
+ u16 status = 0;
+ off_t off = 0;
+
+ ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+ if (!ns) {
+ status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+ goto out;
+ }
+
+ if (memchr_inv(&ns->uuid, 0, sizeof(ns->uuid))) {
+ status = nvmet_copy_ns_identifier(req, NVME_NIDT_UUID,
+ NVME_NIDT_UUID_LEN,
+ &ns->uuid, &off);
+ if (status)
+ goto out_put_ns;
+ }

More:

[acme@five perf]$ find arch/ -type f | xargs grep memchr_inv
arch/x86/kernel/fpu/xstate.c: if (memchr_inv(hdr->reserved, 0, sizeof(hdr->reserved)))
arch/x86/mm/init_64.c: if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
arch/x86/mm/init_64.c: if (!memchr_inv(page_addr, PAGE_INUSE,
arch/x86/mm/init_64.c: if (!memchr_inv(page_addr, PAGE_INUSE,
arch/s390/mm/vmem.c: return !memchr_inv(page, PAGE_UNUSED, PMD_SIZE);
arch/powerpc/platforms/powermac/nvram.c: if (memchr_inv(base, 0xff, NVRAM_SIZE)) {
arch/powerpc/platforms/powermac/nvram.c: if (memchr_inv(base, 0xff, NVRAM_SIZE)) {
[acme@five perf]$

- Arnaldo

2020-09-14 16:12:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 15/26] perf tools: Synthesize proc tasks with mmap3

Em Sun, Sep 13, 2020 at 11:03:02PM +0200, Jiri Olsa escreveu:
> Synthesizing proc tasks with mmap3 events so we can
> get build id data for synthesized maps as well.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/mmap.c | 2 +-
> tools/perf/util/synthetic-events.c | 72 +++++++++++++++++-------------
> 2 files changed, 43 insertions(+), 31 deletions(-)
>
> diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> index ab7108d22428..51f6f86580a9 100644
> --- a/tools/perf/util/mmap.c
> +++ b/tools/perf/util/mmap.c
> @@ -33,7 +33,7 @@ void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)
>
> len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
> buf[len] = '\0';
> - pr_debug("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
> + pr_debug2("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
> }

Can this be in a separate patch?

> size_t mmap__mmap_len(struct mmap *map)
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 89b390623b63..bd6e7b84283d 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -379,7 +379,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> }
> io__init(&io, io.fd, bf, sizeof(bf));
>
> - event->header.type = PERF_RECORD_MMAP2;
> + event->header.type = PERF_RECORD_MMAP3;

This also needs to check if the user is interested in build-id records.
If it is disabled, then using this new tool with mmap3 support will
generate perf.data files that will bot be grokked by older tools,
introducing an annoyance for people not interested in build-ids.

- Arnaldo

> t = rdclock();
>
> while (!io.eof) {
> @@ -387,20 +387,20 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> size_t size;
>
> /* ensure null termination since stack will be reused. */
> - event->mmap2.filename[0] = '\0';
> + event->mmap3.filename[0] = '\0';
>
> /* 00400000-0040c000 r-xp 00000000 fd:01 41038 /bin/cat */
> if (!read_proc_maps_line(&io,
> - &event->mmap2.start,
> - &event->mmap2.len,
> - &event->mmap2.prot,
> - &event->mmap2.flags,
> - &event->mmap2.pgoff,
> - &event->mmap2.maj,
> - &event->mmap2.min,
> - &event->mmap2.ino,
> - sizeof(event->mmap2.filename),
> - event->mmap2.filename))
> + &event->mmap3.start,
> + &event->mmap3.len,
> + &event->mmap3.prot,
> + &event->mmap3.flags,
> + &event->mmap3.pgoff,
> + &event->mmap3.maj,
> + &event->mmap3.min,
> + &event->mmap3.ino,
> + sizeof(event->mmap3.filename),
> + event->mmap3.filename))
> continue;
>
> if ((rdclock() - t) > timeout) {
> @@ -412,7 +412,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> goto out;
> }
>
> - event->mmap2.ino_generation = 0;
> + event->mmap3.ino_generation = 0;
>
> /*
> * Just like the kernel, see __perf_event_mmap in kernel/perf_event.c
> @@ -422,8 +422,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> else
> event->header.misc = PERF_RECORD_MISC_GUEST_USER;
>
> - if ((event->mmap2.prot & PROT_EXEC) == 0) {
> - if (!mmap_data || (event->mmap2.prot & PROT_READ) == 0)
> + if ((event->mmap3.prot & PROT_EXEC) == 0) {
> + if (!mmap_data || (event->mmap3.prot & PROT_READ) == 0)
> continue;
>
> event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
> @@ -433,25 +433,37 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> if (truncation)
> event->header.misc |= PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT;
>
> - if (!strcmp(event->mmap2.filename, ""))
> - strcpy(event->mmap2.filename, anonstr);
> + if (!strcmp(event->mmap3.filename, ""))
> + strcpy(event->mmap3.filename, anonstr);
>
> if (hugetlbfs_mnt_len &&
> - !strncmp(event->mmap2.filename, hugetlbfs_mnt,
> + !strncmp(event->mmap3.filename, hugetlbfs_mnt,
> hugetlbfs_mnt_len)) {
> - strcpy(event->mmap2.filename, anonstr);
> - event->mmap2.flags |= MAP_HUGETLB;
> + strcpy(event->mmap3.filename, anonstr);
> + event->mmap3.flags |= MAP_HUGETLB;
> }
>
> - size = strlen(event->mmap2.filename) + 1;
> + size = strlen(event->mmap3.filename) + 1;
> size = PERF_ALIGN(size, sizeof(u64));
> - event->mmap2.len -= event->mmap.start;
> - event->mmap2.header.size = (sizeof(event->mmap2) -
> - (sizeof(event->mmap2.filename) - size));
> - memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
> - event->mmap2.header.size += machine->id_hdr_size;
> - event->mmap2.pid = tgid;
> - event->mmap2.tid = pid;
> + event->mmap3.len -= event->mmap.start;
> + event->mmap3.header.size = (sizeof(event->mmap3) -
> + (sizeof(event->mmap3.filename) - size));
> + memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
> + event->mmap3.header.size += machine->id_hdr_size;
> + event->mmap3.pid = tgid;
> + event->mmap3.tid = pid;
> +
> + rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
> + BUILD_ID_SIZE);
> + if (rc != BUILD_ID_SIZE) {
> + if (event->mmap3.filename[0] == '/') {
> + pr_debug2("Failed to read build ID for %s\n",
> + event->mmap3.filename);
> + }
> + memset(event->mmap3.buildid, 0x0, sizeof(event->mmap3.buildid));
> + }
> +
> + rc = 0;
>
> if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
> rc = -1;
> @@ -744,7 +756,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
> if (comm_event == NULL)
> goto out;
>
> - mmap_event = malloc(sizeof(mmap_event->mmap2) + machine->id_hdr_size);
> + mmap_event = malloc(sizeof(mmap_event->mmap3) + machine->id_hdr_size);
> if (mmap_event == NULL)
> goto out_free_comm;
>
> @@ -826,7 +838,7 @@ static int __perf_event__synthesize_threads(struct perf_tool *tool,
> if (comm_event == NULL)
> goto out;
>
> - mmap_event = malloc(sizeof(mmap_event->mmap2) + machine->id_hdr_size);
> + mmap_event = malloc(sizeof(mmap_event->mmap3) + machine->id_hdr_size);
> if (mmap_event == NULL)
> goto out_free_comm;
>
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:12:59

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 17/26] perf tools: Synthesize kernel with mmap3

Em Sun, Sep 13, 2020 at 11:03:04PM +0200, Jiri Olsa escreveu:
> Synthesizing kernel with mmap3 events so we can
> get build id data for kernel map as well.

Ditto as for 15/26

> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/synthetic-events.c | 23 ++++++++++++++---------
> 1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index 6bd2423ce2f3..844ca87b6e97 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -1029,7 +1029,7 @@ static int __perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
> * available use this, and after it is use this as a fallback for older
> * kernels.
> */
> - event = zalloc((sizeof(event->mmap) + machine->id_hdr_size));
> + event = zalloc((sizeof(event->mmap3) + machine->id_hdr_size));
> if (event == NULL) {
> pr_debug("Not enough memory synthesizing mmap event "
> "for kernel modules\n");
> @@ -1046,16 +1046,21 @@ static int __perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
> event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
> }
>
> - size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
> + size = snprintf(event->mmap3.filename, sizeof(event->mmap3.filename),
> "%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) + 1;
> size = PERF_ALIGN(size, sizeof(u64));
> - event->mmap.header.type = PERF_RECORD_MMAP;
> - event->mmap.header.size = (sizeof(event->mmap) -
> - (sizeof(event->mmap.filename) - size) + machine->id_hdr_size);
> - event->mmap.pgoff = kmap->ref_reloc_sym->addr;
> - event->mmap.start = map->start;
> - event->mmap.len = map->end - event->mmap.start;
> - event->mmap.pid = machine->pid;
> + event->mmap3.header.type = PERF_RECORD_MMAP3;
> + event->mmap3.header.size = (sizeof(event->mmap3) -
> + (sizeof(event->mmap3.filename) - size) + machine->id_hdr_size);
> + event->mmap3.pgoff = kmap->ref_reloc_sym->addr;
> + event->mmap3.start = map->start;
> + event->mmap3.len = map->end - event->mmap3.start;
> + event->mmap3.pid = machine->pid;
> +
> + err = sysfs__read_build_id("/sys/kernel/notes", event->mmap3.buildid,
> + BUILD_ID_SIZE);
> + if (err)
> + pr_err("Failed to read kernel build ID\n");
>
> err = perf_tool__process_synth_event(tool, event, machine, process);
> free(event);
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:14:12

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

Em Sun, Sep 13, 2020 at 11:03:03PM +0200, Jiri Olsa escreveu:
> Synthesizing modules with mmap3 events so we can
> get build id data for module's maps as well.

Ditto as for 15/26

> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/synthetic-events.c | 37 +++++++++++++++++++-----------
> 1 file changed, 24 insertions(+), 13 deletions(-)
>
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index bd6e7b84283d..6bd2423ce2f3 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -605,7 +605,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> int rc = 0;
> struct map *pos;
> struct maps *maps = machine__kernel_maps(machine);
> - union perf_event *event = zalloc((sizeof(event->mmap) +
> + union perf_event *event = zalloc((sizeof(event->mmap3) +
> machine->id_hdr_size));
> if (event == NULL) {
> pr_debug("Not enough memory synthesizing mmap event "
> @@ -613,8 +613,6 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> return -1;
> }
>
> - event->header.type = PERF_RECORD_MMAP;
> -
> /*
> * kernel uses 0 for user space maps, see kernel/perf_event.c
> * __perf_event_mmap
> @@ -631,17 +629,30 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> continue;
>
> size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
> - event->mmap.header.type = PERF_RECORD_MMAP;
> - event->mmap.header.size = (sizeof(event->mmap) -
> - (sizeof(event->mmap.filename) - size));
> - memset(event->mmap.filename + size, 0, machine->id_hdr_size);
> - event->mmap.header.size += machine->id_hdr_size;
> - event->mmap.start = pos->start;
> - event->mmap.len = pos->end - pos->start;
> - event->mmap.pid = machine->pid;
> -
> - memcpy(event->mmap.filename, pos->dso->long_name,
> + event->mmap3.header.type = PERF_RECORD_MMAP3;
> + event->mmap3.header.size = (sizeof(event->mmap3) -
> + (sizeof(event->mmap3.filename) - size));
> + memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
> + event->mmap3.header.size += machine->id_hdr_size;
> + event->mmap3.start = pos->start;
> + event->mmap3.len = pos->end - pos->start;
> + event->mmap3.pid = machine->pid;
> +
> + memcpy(event->mmap3.filename, pos->dso->long_name,
> pos->dso->long_name_len + 1);
> +
> + rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
> + BUILD_ID_SIZE);
> + if (rc != BUILD_ID_SIZE) {
> + if (event->mmap3.filename[0] == '/') {
> + pr_debug2("Failed to read build ID for %s\n",
> + event->mmap3.filename);
> + }
> + memset(event->mmap3.buildid, 0x0, sizeof(event->mmap3.buildid));
> + }
> +
> + rc = 0;
> +
> if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
> rc = -1;
> break;
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:16:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 10/26] perf tools: Enable mmap3 map event when supported

Em Sun, Sep 13, 2020 at 11:02:57PM +0200, Jiri Olsa escreveu:
> Enabling mmap3 map event when supported and adding
> its disable fallback when it fails.
>
> Also adding mmap3 bit to verbose open output:

This should check if the user disabled build id collection, i.e. if its
not something the user is interest on.

- Arnaldo

> $ perf record -vv kill
> perf_event_attr:
> size 120
> { sample_period, sample_freq } 4000
> ...
> bpf_event 1
> mmap3 1
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/evsel.c | 9 ++++++++-
> tools/perf/util/evsel.h | 1 +
> tools/perf/util/perf_event_attr_fprintf.c | 1 +
> 3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 14baf8542b40..c2cc9b4b30bf 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1065,6 +1065,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
> attr->task = track;
> attr->mmap = track;
> attr->mmap2 = track && !perf_missing_features.mmap2;
> + attr->mmap3 = track && !perf_missing_features.mmap3;
> attr->comm = track;
> /*
> * ksymbol is tracked separately with text poke because it needs to be
> @@ -1657,6 +1658,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> evsel->core.attr.bpf_event = 0;
> if (perf_missing_features.branch_hw_idx)
> evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_HW_INDEX;
> + if (perf_missing_features.mmap3)
> + evsel->core.attr.mmap3 = 0;
> retry_sample_id:
> if (perf_missing_features.sample_id_all)
> evsel->core.attr.sample_id_all = 0;
> @@ -1770,7 +1773,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> * Must probe features in the order they were added to the
> * perf_event_attr interface.
> */
> - if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
> + if (!perf_missing_features.mmap3 && evsel->core.attr.mmap3) {
> + perf_missing_features.mmap3 = true;
> + pr_debug2("switching off mmap3\n");
> + goto fallback_missing_features;
> + } else if (!perf_missing_features.cgroup && evsel->core.attr.cgroup) {
> perf_missing_features.cgroup = true;
> pr_debug2_peo("Kernel has no cgroup sampling support, bailing out\n");
> goto out_close;
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 35e3f6d66085..d49922b22eca 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -119,6 +119,7 @@ struct perf_missing_features {
> bool sample_id_all;
> bool exclude_guest;
> bool mmap2;
> + bool mmap3;
> bool cloexec;
> bool clockid;
> bool clockid_wrong;
> diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
> index e67a227c0ce7..3c52c081693d 100644
> --- a/tools/perf/util/perf_event_attr_fprintf.c
> +++ b/tools/perf/util/perf_event_attr_fprintf.c
> @@ -134,6 +134,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
> PRINT_ATTRf(bpf_event, p_unsigned);
> PRINT_ATTRf(aux_output, p_unsigned);
> PRINT_ATTRf(cgroup, p_unsigned);
> + PRINT_ATTRf(mmap3, p_unsigned);
>
> PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
> PRINT_ATTRf(bp_type, p_unsigned);
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:19:27

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 06/26] perf tools: Add support to read build id from compressed elf

Em Sun, Sep 13, 2020 at 11:02:53PM +0200, Jiri Olsa escreveu:
> Adding support to decompress file before reading build id.
>
> Adding filename__read_build_id and change its current
> versions to read_build_id.

Also a standalone, generally useful, prep patch, applyed.

- Arnaldo

> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/symbol-elf.c | 37 ++++++++++++++++++++++++++++++++++--
> 1 file changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 94a156df22d5..6770572620f3 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -534,7 +534,7 @@ static int elf_read_build_id(Elf *elf, void *bf, size_t size)
>
> #ifdef HAVE_LIBBFD_BUILDID_SUPPORT
>
> -int filename__read_build_id(const char *filename, void *bf, size_t size)
> +static int read_build_id(const char *filename, void *bf, size_t size)
> {
> int err = -1;
> bfd *abfd;
> @@ -562,7 +562,7 @@ int filename__read_build_id(const char *filename, void *bf, size_t size)
>
> #else // HAVE_LIBBFD_BUILDID_SUPPORT
>
> -int filename__read_build_id(const char *filename, void *bf, size_t size)
> +static int read_build_id(const char *filename, void *bf, size_t size)
> {
> int fd, err = -1;
> Elf *elf;
> @@ -591,6 +591,39 @@ int filename__read_build_id(const char *filename, void *bf, size_t size)
>
> #endif // HAVE_LIBBFD_BUILDID_SUPPORT
>
> +int filename__read_build_id(const char *filename, void *bf, size_t size)
> +{
> + struct kmod_path m = { .name = NULL, };
> + char path[PATH_MAX];
> + int err;
> +
> + if (!filename)
> + return -EFAULT;
> +
> + err = kmod_path__parse(&m, filename);
> + if (err)
> + return -1;
> +
> + if (m.comp) {
> + int error = 0, fd;
> +
> + fd = filename__decompress(filename, path, sizeof(path), m.comp, &error);
> + if (fd < 0) {
> + pr_debug("Failed to decompress (error %d) %s\n",
> + error, filename);
> + return -1;
> + }
> + close(fd);
> + filename = path;
> + }
> +
> + err = read_build_id(filename, bf, size);
> +
> + if (m.comp)
> + unlink(filename);
> + return err;
> +}
> +
> int sysfs__read_build_id(const char *filename, void *build_id, size_t size)
> {
> int fd, err = -1;
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:24:03

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 21/26] perf tools: Add machine__for_each_dso function

Em Sun, Sep 13, 2020 at 11:03:08PM +0200, Jiri Olsa escreveu:
> Adding machine__for_each_dso to iterate over all dso
> objects defined for the within the machine. It will
> be used in following changes.

prep work, applying.

- Arnaldo

> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/machine.c | 12 ++++++++++++
> tools/perf/util/machine.h | 4 ++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 863d949ef967..f8e8d0d80847 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -3181,3 +3181,15 @@ char *machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, ch
> *addrp = map->unmap_ip(map, sym->start);
> return sym->name;
> }
> +
> +int machine__for_each_dso(struct machine *machine, machine__dso_t fn, void *priv)
> +{
> + struct dso *pos;
> + int err = 0;
> +
> + list_for_each_entry(pos, &machine->dsos.head, node) {
> + if (fn(pos, machine, priv))
> + err = -1;
> + }
> + return err;
> +}
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index a3c1d0bf89e5..504c707f22bb 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -252,6 +252,10 @@ void machines__destroy_kernel_maps(struct machines *machines);
>
> size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
>
> +typedef int (*machine__dso_t)(struct dso *dso, struct machine *machine, void *priv);
> +
> +int machine__for_each_dso(struct machine *machine, machine__dso_t fn,
> + void *priv);
> int machine__for_each_thread(struct machine *machine,
> int (*fn)(struct thread *thread, void *p),
> void *priv);
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:35:34

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 20/26] perf tools: Add build_id_cache__add function

Em Sun, Sep 13, 2020 at 11:03:07PM +0200, Jiri Olsa escreveu:
> Adding build_id_cache__add function as core function
> that adds file into build id database. It will be
> sed from another callers in following changes.

More prep work generally useful, applying now.

- Arnaldo

> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> tools/perf/util/build-id.c | 42 ++++++++++++++++++++++++--------------
> tools/perf/util/build-id.h | 2 ++
> 2 files changed, 29 insertions(+), 15 deletions(-)
>
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index b281c97894e0..bf044e52ad1f 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -668,24 +668,15 @@ static char *build_id_cache__find_debug(const char *sbuild_id,
> return realname;
> }
>
> -int build_id_cache__add_s(const char *sbuild_id, const char *name,
> - struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
> +int
> +build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
> + struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
> {
> const size_t size = PATH_MAX;
> - char *realname = NULL, *filename = NULL, *dir_name = NULL,
> - *linkname = zalloc(size), *tmp;
> + char *filename = NULL, *dir_name = NULL, *linkname = zalloc(size), *tmp;
> char *debugfile = NULL;
> int err = -1;
>
> - if (!is_kallsyms) {
> - if (!is_vdso)
> - realname = nsinfo__realpath(name, nsi);
> - else
> - realname = realpath(name, NULL);
> - if (!realname)
> - goto out_free;
> - }
> -
> dir_name = build_id_cache__cachedir(sbuild_id, name, nsi, is_kallsyms,
> is_vdso);
> if (!dir_name)
> @@ -786,8 +777,6 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
> pr_debug4("Failed to update/scan SDT cache for %s\n", realname);
>
> out_free:
> - if (!is_kallsyms)
> - free(realname);
> free(filename);
> free(debugfile);
> free(dir_name);
> @@ -795,6 +784,29 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
> return err;
> }
>
> +int build_id_cache__add_s(const char *sbuild_id, const char *name,
> + struct nsinfo *nsi, bool is_kallsyms, bool is_vdso)
> +{
> + char *realname = NULL;
> + int err = -1;
> +
> + if (!is_kallsyms) {
> + if (!is_vdso)
> + realname = nsinfo__realpath(name, nsi);
> + else
> + realname = realpath(name, NULL);
> + if (!realname)
> + goto out_free;
> + }
> +
> + err = build_id_cache__add(sbuild_id, name, realname, nsi, is_kallsyms, is_vdso);
> +
> +out_free:
> + if (!is_kallsyms)
> + free(realname);
> + return err;
> +}
> +
> static int build_id_cache__add_b(const u8 *build_id, size_t build_id_size,
> const char *name, struct nsinfo *nsi,
> bool is_kallsyms, bool is_vdso)
> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
> index 2cf87b7304e2..6d1c7180047b 100644
> --- a/tools/perf/util/build-id.h
> +++ b/tools/perf/util/build-id.h
> @@ -50,6 +50,8 @@ char *build_id_cache__complement(const char *incomplete_sbuild_id);
> int build_id_cache__list_build_ids(const char *pathname, struct nsinfo *nsi,
> struct strlist **result);
> bool build_id_cache__cached(const char *sbuild_id);
> +int build_id_cache__add(const char *sbuild_id, const char *name, const char *realname,
> + struct nsinfo *nsi, bool is_kallsyms, bool is_vdso);
> int build_id_cache__add_s(const char *sbuild_id,
> const char *name, struct nsinfo *nsi,
> bool is_kallsyms, bool is_vdso);
> --
> 2.26.2
>

--

- Arnaldo

2020-09-14 16:38:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:

> > > struct {
> > > struct perf_event_header header;
>
> > > u32 pid, tid;
> > > u64 addr;
> > > u64 len;
> > > u64 pgoff;
> > > u32 maj;
> > > u32 min;
> > > u64 ino;
> > > u64 ino_generation;
> > > u32 prot, flags;
> > > u32 reserved;
>
> What for this reserved? its all nicely aligned already, u64 followed by
> two u32 (prot, flags).

I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
yes, that's a wee bit daft, since the next field is a variable length
character array.

> > > u8 buildid[20];
>
> > Do we need maj, min, ino, ino_generation for mmap3 event?
> > I think they are to compare binaries, then we can do it with
> > build-id (and I think it'd be better)..
>
> Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> superset of MMAP2.

Well, the 'funny' thing is that if you do use buildid, then
{maj,min,ino,ino_generation} are indeed superfluous, but are combined
also large enough to contain buildid.

> If we want to ditch useless stuff, then trow away pid, tid too, as we
> can select those via sample_type.

Correct.

So something like:

struct {
struct perf_event_header header;

u64 addr;
u64 len;
u64 pgoff;
union {
struct {
u32 maj;
u32 min;
u64 ino;
u64 ino_generation;
};
u8 buildid[20];
};
u32 prot, flags;
char filename[];
struct sample_id sample_id;
};

Using one of the MISC bits to resolve the union. Might actually bring
benefit to everyone. Us normal people get to have a smaller MMAP record,
while the buildid folks can have it too.

Even more extreme would be using 2 MISC bits and allowing the union to
be 0 sized for anon.

That said; I have the nagging feeling there were unresolved issues with
mmap2, but I can't seem to find any relevant emails on it :/ My
google-fu is weak today.

2020-09-14 16:45:57

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC 00/26] perf: Add mmap3 support

Em Mon, Sep 14, 2020 at 02:25:25PM +0900, Namhyung Kim escreveu:
> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
> > while playing with perf daemon support I realized I need
> > the build id data in mmap events, so we don't need to care
> > about removed/updated binaries during long perf runs.

> > This RFC patchset adds new mmap3 events that copies mmap2
> > event and adds build id in it. It makes mmap3 the default
> > mmap event for synthesizing kernel/modules/tasks and adds
> > some tooling enhancements to enable the workflow below.

> Cool! It's nice that we can skip the final build-id collection stage
> with this while data size will be bigger.

Yeah, this is something long overdue, comes with extra cost for people
not wanting build-ids, but then they can just use MMAP2 or even MMAP if
that is enough.

More comments on the other patches.

- Arnaldo

2020-09-14 17:11:13

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 9:35 AM <[email protected]> wrote:
>
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
>
> > > > struct {
> > > > struct perf_event_header header;
> >
> > > > u32 pid, tid;
> > > > u64 addr;
> > > > u64 len;
> > > > u64 pgoff;
> > > > u32 maj;
> > > > u32 min;
> > > > u64 ino;
> > > > u64 ino_generation;
> > > > u32 prot, flags;
> > > > u32 reserved;
> >
> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).
>
> I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
> yes, that's a wee bit daft, since the next field is a variable length
> character array.
>
> > > > u8 buildid[20];
> >
> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..
> >
> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.
>
> Well, the 'funny' thing is that if you do use buildid, then
> {maj,min,ino,ino_generation} are indeed superfluous, but are combined
> also large enough to contain buildid.
>
> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.
>
> Correct.
>
> So something like:
>
> struct {
> struct perf_event_header header;
>
> u64 addr;
> u64 len;
> u64 pgoff;
> union {
> struct {
> u32 maj;
> u32 min;
> u64 ino;
> u64 ino_generation;
> };
> u8 buildid[20];
> };
> u32 prot, flags;
> char filename[];
> struct sample_id sample_id;
> };
>
> Using one of the MISC bits to resolve the union. Might actually bring
> benefit to everyone. Us normal people get to have a smaller MMAP record,
> while the buildid folks can have it too.
>
> Even more extreme would be using 2 MISC bits and allowing the union to
> be 0 sized for anon.
>
> That said; I have the nagging feeling there were unresolved issues with
> mmap2, but I can't seem to find any relevant emails on it :/ My
> google-fu is weak today.

Firstly, thanks Jiri for this really useful patch set for our
use-cases! Some thoughts:

One issue with mmap2 events at the moment is that they happen "after"
the mmap is performed. This allows the mapped address to be known but
has the unfortunate problem of causing mmap events to get "extended"
due to adjacent vmas being merged. There are some workarounds like
commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
Perhaps these events can switch to reporting the length the mmap
requested rather than the length of the vma?

I could imagine that changes here could be interesting to folks doing
CHERI or hypervisors, for example, they may want more address bits.

BPF has stack traces with elements of buildid + offset. Using these in
perf samples would avoid the need for mmap events, synthesis, etc. but
at the cost of larger and possibly slower stack traces. Perhaps we
should just remove the idea of mmap events altogether?

Thanks,
Ian

2020-09-14 17:37:54

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 2:08 AM <[email protected]> wrote:
>
> On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> > what happens if I set mmap3 and mmap2?
> >
> > I think using mmap3 for every mmap may be overkill as you add useless
> > 20 bytes to an mmap record.
> > I am not sure if your code handles the case where mmap3 is not needed
> > because there is no buildid, e.g, anonymous memory.
> > It seems to me you've written the patch in such a way that if the user
> > tool supports mmap3, then it supersedes mmap2, and thus
> > you need all the fields of mmap2. But if could be more interesting to
> > return either MMAP2 or MMAP3 depending on tool support
> > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > But maybe that logic is already in your patch and I missed it.
>
> That, and what if you don't want any of that buildid nonsense at all? I
> always kill that because it makes perf pointlessly slow and has
> absolutely no upsides for me.
>
I have seen situations where the perf tool takes a visibly significant
amount of time (many seconds) to inject the buildids at the end of the
collection
of perf record (same if using perf inject -b). That is because it
needs to go through all the records in the perf.data to find MMAP
records and then read
the buildids from the filesystem. This has caused some problems in our
environment. Having the kernel add the buildid to *relevant* mmaps
would avoid
a lot of that penalty, by avoiding having to parse the perf.data file
and leveraging the fact that the buildid may be in memory already.
Although my concern on
this has to do with large pages and the impact they have on alignment
of sections in memory. I think Ian can comment better on this.

I think this patch series is useful if it can demonstrate a speedup
during recording (perf record or perf record | perf inject -b). But it
needs to be
optimized to minimize the volume of useless info returned. And Jiri
needs to decide if MMAP3 is a replacement of MMAP2, or a different
kind of record
targeted at ELF images only in which case some of the fields may be
removed. My tendency would be to go for the latter.

2020-09-14 19:39:56

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 06:35:34PM +0200, [email protected] wrote:
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
>
> > > > struct {
> > > > struct perf_event_header header;
> >
> > > > u32 pid, tid;
> > > > u64 addr;
> > > > u64 len;
> > > > u64 pgoff;
> > > > u32 maj;
> > > > u32 min;
> > > > u64 ino;
> > > > u64 ino_generation;
> > > > u32 prot, flags;
> > > > u32 reserved;
> >
> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).
>
> I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
> yes, that's a wee bit daft, since the next field is a variable length
> character array.
>
> > > > u8 buildid[20];
> >
> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..
> >
> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.
>
> Well, the 'funny' thing is that if you do use buildid, then
> {maj,min,ino,ino_generation} are indeed superfluous, but are combined
> also large enough to contain buildid.

yay! nice

>
> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.
>
> Correct.

can we? I think you could disable sample_id then
you won't have pid/tid in mmap event

>
> So something like:
>
> struct {
> struct perf_event_header header;
>
> u64 addr;
> u64 len;
> u64 pgoff;
> union {
> struct {
> u32 maj;
> u32 min;
> u64 ino;
> u64 ino_generation;
> };
> u8 buildid[20];
> };
> u32 prot, flags;
> char filename[];
> struct sample_id sample_id;
> };
>
> Using one of the MISC bits to resolve the union. Might actually bring
> benefit to everyone. Us normal people get to have a smaller MMAP record,
> while the buildid folks can have it too.
>
> Even more extreme would be using 2 MISC bits and allowing the union to
> be 0 sized for anon.

I like that idea, I'll check on it

thanks,
jirka

2020-09-14 19:40:08

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim wrote:

SNIP

> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output : 1, /* generate AUX records instead of events */
> > cgroup : 1, /* include cgroup events */
> > text_poke : 1, /* include text poke events */
> > - __reserved_1 : 30;
> > + mmap3 : 1, /* include bpf events */
>
> ???
>
> > + __reserved_1 : 29;
> >
> > union {
> > __u32 wakeup_events; /* wakeup every n events */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> > */
> > PERF_RECORD_TEXT_POKE = 20,
> >
> > + /*
> > + * The MMAP3 records are an augmented version of MMAP2, they add
> > + * build id value to identify the exact binary behind map
> > + *
> > + * struct {
> > + * struct perf_event_header header;
> > + *
> > + * u32 pid, tid;
> > + * u64 addr;
> > + * u64 len;
> > + * u64 pgoff;
> > + * u32 maj;
> > + * u32 min;
> > + * u64 ino;
> > + * u64 ino_generation;
> > + * u32 prot, flags;
> > + * u32 reserved;
> > + * u8 buildid[20];
> > + * char filename[];
> > + * struct sample_id sample_id;
> > + * };
> > + */
> > + PERF_RECORD_MMAP3 = 21,
> > +
> > PERF_RECORD_MAX, /* non-ABI */
> > };
> >
> [SNIP]
> > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> > mmap_event->prot = prot;
> > mmap_event->flags = flags;
> >
> > + if (atomic_read(&nr_mmap3_events))
> > + build_id_parse(vma, mmap_event->buildid);
>
> What about if it failed? We should zero out the build-id..

it is initialized to zero in perf_event_mmap

mmap_event = (struct perf_mmap_event){
.vma = vma,
...

I'll double check build_id_parse won't leave anything half
baked there, but I dont think so

thanks,
jirka

2020-09-14 19:41:08

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> > On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
> > > Add new version of mmap event. The MMAP3 record is an
> > > augmented version of MMAP2, it adds build id value to
> > > identify the exact binary object behind memory map:
>
> > > struct {
> > > struct perf_event_header header;
>
> > > u32 pid, tid;
> > > u64 addr;
> > > u64 len;
> > > u64 pgoff;
> > > u32 maj;
> > > u32 min;
> > > u64 ino;
> > > u64 ino_generation;
> > > u32 prot, flags;
> > > u32 reserved;
>
> What for this reserved? its all nicely aligned already, u64 followed by
> two u32 (prot, flags).
>
> > > u8 buildid[20];
>
> > Do we need maj, min, ino, ino_generation for mmap3 event?
> > I think they are to compare binaries, then we can do it with
> > build-id (and I think it'd be better)..
>
> Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> superset of MMAP2.
>
> If we want to ditch useless stuff, then trow away pid, tid too, as we
> can select those via sample_type.
>
> Having said that, at this point I don't even know if adding new
> PERF_RECORD_ that are an update for a preexisting one is the right way
> to proceed.
>
> Perhaps we should attach a BPF program to point where a mmap/munmap is
> being done (perf_event_mmap()) and allow userspace to ask for whatever
> it wants? With a kprobes there right now we can implement this MMAP3
> easily, no?

hmm, I'm always woried about solutions based on kprobes,
because once the function is moved/removed you're screwed
and need to keep up with function name changes and be
backward compatible..

jirka

2020-09-14 19:42:31

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Sun, Sep 13, 2020 at 11:20:31PM -0700, Song Liu wrote:
> On Sun, Sep 13, 2020 at 10:40 PM Namhyung Kim <[email protected]> wrote:
> >
> > On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Add new version of mmap event. The MMAP3 record is an
> > > augmented version of MMAP2, it adds build id value to
> > > identify the exact binary object behind memory map:
> > >
> > > struct {
> > > struct perf_event_header header;
> > >
> > > u32 pid, tid;
> > > u64 addr;
> > > u64 len;
> > > u64 pgoff;
> > > u32 maj;
> > > u32 min;
> > > u64 ino;
> > > u64 ino_generation;
> > > u32 prot, flags;
> > > u32 reserved;
>
> I guess we need reserved _after_ buildid, no?

it's there to align the size to 8 bytes,
so the sample_id is in proper place

but yes, perhaps after buildid would make more sense

>
> > > u8 buildid[20];
> >
> > Do we need maj, min, ino, ino_generation for mmap3 event?
> > I think they are to compare binaries, then we can do it with
> > build-id (and I think it'd be better)..
>
> +1 we shouldn't need maj, min, etc.

right, and as peter already wrote buildid could fit
in that space.. yay :)

thanks,
jirka

>
> Thanks,
> Song
>
> [...]
>

2020-09-14 19:46:54

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> >
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> >
> > struct {
> > struct perf_event_header header;
> >
> > u32 pid, tid;
> > u64 addr;
> > u64 len;
> > u64 pgoff;
> > u32 maj;
> > u32 min;
> > u64 ino;
> > u64 ino_generation;
> > u32 prot, flags;
> > u32 reserved;
> > u8 buildid[20];
> > char filename[];
> > struct sample_id sample_id;
> > };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct. When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> > kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> > 2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output : 1, /* generate AUX records instead of events */
> > cgroup : 1, /* include cgroup events */
> > text_poke : 1, /* include text poke events */
> > - __reserved_1 : 30;
> > + mmap3 : 1, /* include bpf events */
> > + __reserved_1 : 29;
> >
> what happens if I set mmap3 and mmap2?

hum bad things probably ;-) I think mmap3 would overload mmap2

>
> I think using mmap3 for every mmap may be overkill as you add useless
> 20 bytes to an mmap record.
> I am not sure if your code handles the case where mmap3 is not needed
> because there is no buildid, e.g, anonymous memory.
> It seems to me you've written the patch in such a way that if the user
> tool supports mmap3, then it supersedes mmap2, and thus
> you need all the fields of mmap2. But if could be more interesting to
> return either MMAP2 or MMAP3 depending on tool support
> and type of mmap, that would certainly save 20 bytes on any anon mmap.
> But maybe that logic is already in your patch and I missed it.

I like peter's idea of ditching mmap3 and use that maj/min..
area in mmap2 for buildid based on misc bits

jirka

2020-09-14 19:52:21

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 12:31:34PM -0300, Arnaldo Carvalho de Melo wrote:

SNIP

> > > ---
> > > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> > > kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> > > 2 files changed, 57 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > > index 077e7ee69e3d..facfc3c673ed 100644
> > > --- a/include/uapi/linux/perf_event.h
> > > +++ b/include/uapi/linux/perf_event.h
> > > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > > aux_output : 1, /* generate AUX records instead of events */
> > > cgroup : 1, /* include cgroup events */
> > > text_poke : 1, /* include text poke events */
> > > - __reserved_1 : 30;
> > > + mmap3 : 1, /* include bpf events */
> > > + __reserved_1 : 29;
> > >
> > what happens if I set mmap3 and mmap2?
> >
> > I think using mmap3 for every mmap may be overkill as you add useless
> > 20 bytes to an mmap record.
>
> So use just PERF_RECORD_MMAP2.
>
> I think if the user says: I need buildids, then, in kernels with support
> for getting the buildid in MMAP records, use it as its more accurate,
> otherwise fall back to traversing all records at the end to go over lots
> of files haversting those build-ids.

ok, so special record option to enable this

>
> If the user says I don't want build-ids, nothing changes, no collection
> at the end, perf continues using PERF_RECORD_MMAP2.

and that's -B option in record

>
> > I am not sure if your code handles the case where mmap3 is not needed
> > because there is no buildid, e.g, anonymous memory.
> > It seems to me you've written the patch in such a way that if the user
> > tool supports mmap3, then it supersedes mmap2, and thus
> > you need all the fields of mmap2. But if could be more interesting to
> > return either MMAP2 or MMAP3 depending on tool support
> > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > But maybe that logic is already in your patch and I missed it.
>
> Right, it should take into account if the user asked for build-ids or
> not in addition to checking if the kernel supports MMAP3.

right, thanks,

jirka

2020-09-14 19:57:50

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 10:26:05AM -0700, Stephane Eranian wrote:
> On Mon, Sep 14, 2020 at 2:08 AM <[email protected]> wrote:
> >
> > On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> > > what happens if I set mmap3 and mmap2?
> > >
> > > I think using mmap3 for every mmap may be overkill as you add useless
> > > 20 bytes to an mmap record.
> > > I am not sure if your code handles the case where mmap3 is not needed
> > > because there is no buildid, e.g, anonymous memory.
> > > It seems to me you've written the patch in such a way that if the user
> > > tool supports mmap3, then it supersedes mmap2, and thus
> > > you need all the fields of mmap2. But if could be more interesting to
> > > return either MMAP2 or MMAP3 depending on tool support
> > > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > > But maybe that logic is already in your patch and I missed it.
> >
> > That, and what if you don't want any of that buildid nonsense at all? I
> > always kill that because it makes perf pointlessly slow and has
> > absolutely no upsides for me.
> >
> I have seen situations where the perf tool takes a visibly significant
> amount of time (many seconds) to inject the buildids at the end of the
> collection
> of perf record (same if using perf inject -b). That is because it
> needs to go through all the records in the perf.data to find MMAP
> records and then read
> the buildids from the filesystem. This has caused some problems in our
> environment. Having the kernel add the buildid to *relevant* mmaps
> would avoid
> a lot of that penalty, by avoiding having to parse the perf.data file
> and leveraging the fact that the buildid may be in memory already.
> Although my concern on
> this has to do with large pages and the impact they have on alignment
> of sections in memory. I think Ian can comment better on this.
>
> I think this patch series is useful if it can demonstrate a speedup
> during recording (perf record or perf record | perf inject -b). But it

I haven't meassured, but I assume skipping of perf.data search
at the end of the record will make up for reading buildid for
each mmap event.. migt be tricky in mmap events heavy workloads

> needs to be
> optimized to minimize the volume of useless info returned. And Jiri
> needs to decide if MMAP3 is a replacement of MMAP2, or a different
> kind of record
> targeted at ELF images only in which case some of the fields may be
> removed. My tendency would be to go for the latter.

yes, I like the latter as well, let's see

thanks,
jirka

2020-09-14 20:09:47

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 10:08:01AM -0700, Ian Rogers wrote:

SNIP

> >
> > Using one of the MISC bits to resolve the union. Might actually bring
> > benefit to everyone. Us normal people get to have a smaller MMAP record,
> > while the buildid folks can have it too.
> >
> > Even more extreme would be using 2 MISC bits and allowing the union to
> > be 0 sized for anon.
> >
> > That said; I have the nagging feeling there were unresolved issues with
> > mmap2, but I can't seem to find any relevant emails on it :/ My
> > google-fu is weak today.
>
> Firstly, thanks Jiri for this really useful patch set for our
> use-cases! Some thoughts:
>
> One issue with mmap2 events at the moment is that they happen "after"
> the mmap is performed. This allows the mapped address to be known but
> has the unfortunate problem of causing mmap events to get "extended"
> due to adjacent vmas being merged. There are some workarounds like
> commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
> Perhaps these events can switch to reporting the length the mmap
> requested rather than the length of the vma?

seems like separate feature, perhaps we could use another MISC bit for that?

>
> I could imagine that changes here could be interesting to folks doing
> CHERI or hypervisors, for example, they may want more address bits.
>
> BPF has stack traces with elements of buildid + offset. Using these in
> perf samples would avoid the need for mmap events, synthesis, etc. but
> at the cost of larger and possibly slower stack traces. Perhaps we
> should just remove the idea of mmap events altogether?

hm, we also need mmap events to resolve PERF_SAMPLE_IP, right?
also not sure how would we do that for dwarf unwind, and perhaps
there are some other places.. c2c data address resolving?

thanks,
jirka

2020-09-14 20:21:14

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 07/26] perf tools: Add check for existing link in buildid dir

On Mon, Sep 14, 2020 at 02:54:36PM +0900, Namhyung Kim wrote:
> On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> >
> > When adding new build id link we fail if the link is already
> > there. Adding check for existing link and warn/replace the
> > link with new target.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/build-id.c | 20 +++++++++++++++++++-
> > 1 file changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index bdee4e08e60d..ecdc167aa1a0 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -751,8 +751,26 @@ int build_id_cache__add_s(const char *sbuild_id, const char *name,
> > tmp = dir_name + strlen(buildid_dir) - 5;
> > memcpy(tmp, "../..", 5);
> >
> > - if (symlink(tmp, linkname) == 0)
> > + if (symlink(tmp, linkname) == 0) {
> > err = 0;
> > + } else if (errno == EEXIST) {
> > + char path[PATH_MAX];
> > +
> > + if (readlink(linkname, path, sizeof(path)) == -1) {
> > + pr_err("Cant read link: %s\n", linkname);
>
> typo

ok

>
> > + goto out_free;
> > + }
> > + if (strcmp(tmp, path)) {
> > + pr_err("Inconsistent .debug record, updating [%s]\n",
> > + linkname);
>
> But isn't it ok to copy a binary to another location?
> There can be multiple binaries with the same build-id..

ah true.. perhaps just debug message would be good in here

previou code failed in this case, but I think we do not check on
return value in upper layer

thanks,
jirka

2020-09-14 20:31:03

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 14/26] perf tools: Add mmap3 events to --show-mmap-events option

On Mon, Sep 14, 2020 at 03:30:53PM +0900, Namhyung Kim wrote:
> On Mon, Sep 14, 2020 at 6:04 AM Jiri Olsa <[email protected]> wrote:
> >
> > Displaying mmap3 events for --show-mmap-events option,
> > the build id is displayed within <> braces:
> >
> > $ perf script --show-mmap-events
> > kill 12148 13893.519014: PERF_RECORD_MMAP3 12148/12148: <43938d0803c5e3130ea679cd569aaf44b98d9ae8> [0x560e7d7f..
> > kill 12148 13893.519420: PERF_RECORD_MMAP3 12148/12148: <1805c738c8f3ec0f47b7ea09080c28f34d18a82b> [0x7f9e7dfc..
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/builtin-script.c | 33 +++++++++++++++++++++++++++++++++
> > 1 file changed, 33 insertions(+)
> >
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index d839983cfb88..9c09581d5cb0 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -2342,6 +2342,38 @@ static int process_mmap2_event(struct perf_tool *tool,
> > event->mmap2.tid);
> > }
> >
> > +static int process_mmap3_event(struct perf_tool *tool,
> > + union perf_event *event,
> > + struct perf_sample *sample,
> > + struct machine *machine)
> > +{
> > + struct thread *thread;
> > + struct perf_script *script = container_of(tool, struct perf_script, tool);
> > + struct perf_session *session = script->session;
> > + struct evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
> > +
> > + if (perf_event__process_mmap3(tool, event, sample, machine) < 0)
> > + return -1;
> > +
> > + thread = machine__findnew_thread(machine, event->mmap3.pid, event->mmap3.tid);
> > + if (thread == NULL) {
> > + pr_debug("problem processing MMAP2 event, skipping it.\n");
>
> MMAP3 ?

yes, thanks,
jirka

2020-09-14 20:32:46

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 09/26] perf tools: Try load vmlinux from buildid database

On Mon, Sep 14, 2020 at 03:25:39PM +0900, Namhyung Kim wrote:
> On Mon, Sep 14, 2020 at 6:04 AM Jiri Olsa <[email protected]> wrote:
> >
> > Currently we don't check on kernel's vmlinux the same way as
> > we do for normal binaries, but we either look for kallsyms
> > file in build id database or check on known vmlinux locations
> > (plus some other optional paths).
> >
> > This patch adds the check for standard build id binary location,
> > so we are ready once we start to store it there from debuginfod
> > in following changes.
>
> But dso__load_vmlinux_path() already has the logic.
> Also you should check symbol_conf.ignore_vmlinux_buildid.

I wanted to have it not dependent on !symbol_conf.ignore_vmlinux
which is needed to call dso__load_vmlinux_path

also the idea was to try the build id vmlinux before the configured
vmlinux locations, because they found the vmlinux in my setup ;-)

I'll double check the logic again

thanks,
jirka

2020-09-14 20:44:17

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 24/26] perf tools: Add buildid-list --store option

On Mon, Sep 14, 2020 at 12:14:13PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 14, 2020 at 03:42:55PM +0900, Namhyung Kim escreveu:
> > On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Adding buildid-list --store option to populate
> > > .debug data with build id files.
> >
> > Hmm.. isn't it better to add it to the buildid-cache command?
>
> Yes, that is the right place. 'buildid-list' is about perf.data files,
> buildid-cache is about .debug cache.

I saw it in buildid-list, because it works over perf.data
by default and --store option made it obvious for me

but I guess we could have the same, like:

$ perf buildid-cache --store[=path]

with path being perf.data by default

thanks,
jirka

2020-09-14 20:46:32

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 24/26] perf tools: Add buildid-list --store option

On Mon, Sep 14, 2020 at 03:42:55PM +0900, Namhyung Kim wrote:
> On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding buildid-list --store option to populate
> > .debug data with build id files.
>
> Hmm.. isn't it better to add it to the buildid-cache command?
> > + * - store binary to build id database
> > + */
> > + is_kallsyms = !strcmp(machine->mmap_name, dso->short_name);
> > + build_id__sprintf(dso->build_id, sizeof(dso->build_id), sbuild_id);
> > +
> > + if (is_kallsyms) {
> > + /*
> > + * Find out if we are on the same kernel as perf.data
> > + * and keel kallsyms in that case.
> > + */
> > + path = strdup(dso->long_name);
> > + if (!path)
> > + goto out_err;
> > +
> > + err = sysfs__read_build_id("/sys/kernel/notes", &bid, sizeof(bid));
> > + if (err < 0)
> > + goto out_err;
> > + } else {
> > + struct stat st;
> > +
> > + /*
> > + * Does the file exists in the first place, if it does,
> > + * resolve path and read the build id.
> > + */
> > + if (stat(dso->long_name, &st)) {
> > + zfree(&path);
> > + goto try_download;
> > + }
> > +
> > + path = nsinfo__realpath(dso->long_name, dso->nsinfo);
> > + if (!path)
> > + goto out_err;
> > +
> > + err = filename__read_build_id(path, &bid, sizeof(bid));
>
> Is it ok to read the file out of the namespace?

right, I need to enclose the whole part into nsinfo__mountns_*

thanks,
jirka

2020-09-14 20:47:38

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 04/26] perf tools: Add filename__decompress function

On Mon, Sep 14, 2020 at 12:35:54PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Sep 13, 2020 at 11:02:51PM +0200, Jiri Olsa escreveu:
> > Factor filename__decompress from decompress_kmodule function.
> > It can decompress files with compressions supported in perf -
> > xz and gz, the support needs to be compiled in.
> >
> > It will to be used in following changes to get build id out of
> > compressed elf objects.
>
> This is prep work, can be applied now, done.

thanks,
jirka

2020-09-14 20:50:16

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 05/26] perf tools: Add build_id__is_defined function

On Mon, Sep 14, 2020 at 02:44:35PM +0900, Namhyung Kim wrote:
> On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding build_id__is_defined helper to check build id
> > is defined and is != zero build id.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/build-id.c | 11 +++++++++++
> > tools/perf/util/build-id.h | 1 +
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index 31207b6e2066..bdee4e08e60d 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -902,3 +902,14 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
> >
> > return ret;
> > }
> > +
> > +bool build_id__is_defined(const u8 *build_id)
> > +{
> > + static u8 zero[BUILD_ID_SIZE];
> > + int err = 0;
> > +
> > + if (build_id)
> > + err = memcmp(build_id, &zero, BUILD_ID_SIZE);
> > +
> > + return err ? true : false;
> > +}
>
> I think this is a bit confusing.. How about this?
>
> bool ret = false;
> if (build_id)
> ret = memcmp(...);
> return ret;

ok

>
> Or, it can be a oneliner..

everything can be oneliner ;-)

thanks,
jirka

2020-09-14 20:51:13

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 06/26] perf tools: Add support to read build id from compressed elf

On Mon, Sep 14, 2020 at 01:04:34PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Sep 13, 2020 at 11:02:53PM +0200, Jiri Olsa escreveu:
> > Adding support to decompress file before reading build id.
> >
> > Adding filename__read_build_id and change its current
> > versions to read_build_id.
>
> Also a standalone, generally useful, prep patch, applyed.

thanks,
jirka

2020-09-14 20:52:33

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 05/26] perf tools: Add build_id__is_defined function

On Mon, Sep 14, 2020 at 01:03:18PM -0300, Arnaldo Carvalho de Melo wrote:

SNIP

> + if (!ns) {
> + status = NVME_SC_INVALID_NS | NVME_SC_DNR;
> + goto out;
> + }
> +
> + if (memchr_inv(&ns->uuid, 0, sizeof(ns->uuid))) {
> + status = nvmet_copy_ns_identifier(req, NVME_NIDT_UUID,
> + NVME_NIDT_UUID_LEN,
> + &ns->uuid, &off);
> + if (status)
> + goto out_put_ns;
> + }
>
> More:
>
> [acme@five perf]$ find arch/ -type f | xargs grep memchr_inv
> arch/x86/kernel/fpu/xstate.c: if (memchr_inv(hdr->reserved, 0, sizeof(hdr->reserved)))
> arch/x86/mm/init_64.c: if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> arch/x86/mm/init_64.c: if (!memchr_inv(page_addr, PAGE_INUSE,
> arch/x86/mm/init_64.c: if (!memchr_inv(page_addr, PAGE_INUSE,
> arch/s390/mm/vmem.c: return !memchr_inv(page, PAGE_UNUSED, PMD_SIZE);
> arch/powerpc/platforms/powermac/nvram.c: if (memchr_inv(base, 0xff, NVRAM_SIZE)) {
> arch/powerpc/platforms/powermac/nvram.c: if (memchr_inv(base, 0xff, NVRAM_SIZE)) {
> [acme@five perf]$

nice, another stricg.c candidate ;-)

I can add the is_zeroed function and we can speed it up
for above archs later

thanks,
jirka

2020-09-14 20:53:54

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 15/26] perf tools: Synthesize proc tasks with mmap3

On Mon, Sep 14, 2020 at 01:07:38PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Sep 13, 2020 at 11:03:02PM +0200, Jiri Olsa escreveu:
> > Synthesizing proc tasks with mmap3 events so we can
> > get build id data for synthesized maps as well.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/mmap.c | 2 +-
> > tools/perf/util/synthetic-events.c | 72 +++++++++++++++++-------------
> > 2 files changed, 43 insertions(+), 31 deletions(-)
> >
> > diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
> > index ab7108d22428..51f6f86580a9 100644
> > --- a/tools/perf/util/mmap.c
> > +++ b/tools/perf/util/mmap.c
> > @@ -33,7 +33,7 @@ void mmap_cpu_mask__scnprintf(struct mmap_cpu_mask *mask, const char *tag)
> >
> > len = bitmap_scnprintf(mask->bits, mask->nbits, buf, MASK_SIZE);
> > buf[len] = '\0';
> > - pr_debug("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
> > + pr_debug2("%p: %s mask[%zd]: %s\n", mask, tag, mask->nbits, buf);
> > }
>
> Can this be in a separate patch?

ok

>
> > size_t mmap__mmap_len(struct mmap *map)
> > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > index 89b390623b63..bd6e7b84283d 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -379,7 +379,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
> > }
> > io__init(&io, io.fd, bf, sizeof(bf));
> >
> > - event->header.type = PERF_RECORD_MMAP2;
> > + event->header.type = PERF_RECORD_MMAP3;
>
> This also needs to check if the user is interested in build-id records.
> If it is disabled, then using this new tool with mmap3 support will
> generate perf.data files that will bot be grokked by older tools,
> introducing an annoyance for people not interested in build-ids.

ok, this should disappear with if we stay with mmap2
and do that union stuff

thanks,
jirka

2020-09-14 21:54:12

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 05/26] perf tools: Add build_id__is_defined function

Em Mon, Sep 14, 2020 at 10:47:01PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 14, 2020 at 02:44:35PM +0900, Namhyung Kim wrote:
> > On Mon, Sep 14, 2020 at 6:05 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Adding build_id__is_defined helper to check build id
> > > is defined and is != zero build id.
> > >
> > > Signed-off-by: Jiri Olsa <[email protected]>
> > > ---
> > > tools/perf/util/build-id.c | 11 +++++++++++
> > > tools/perf/util/build-id.h | 1 +
> > > 2 files changed, 12 insertions(+)
> > >
> > > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > > index 31207b6e2066..bdee4e08e60d 100644
> > > --- a/tools/perf/util/build-id.c
> > > +++ b/tools/perf/util/build-id.c
> > > @@ -902,3 +902,14 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
> > >
> > > return ret;
> > > }
> > > +
> > > +bool build_id__is_defined(const u8 *build_id)
> > > +{
> > > + static u8 zero[BUILD_ID_SIZE];
> > > + int err = 0;
> > > +
> > > + if (build_id)
> > > + err = memcmp(build_id, &zero, BUILD_ID_SIZE);
> > > +
> > > + return err ? true : false;
> > > +}
> >
> > I think this is a bit confusing.. How about this?
> >
> > bool ret = false;
> > if (build_id)
> > ret = memcmp(...);
> > return ret;
>
> ok
>
> >
> > Or, it can be a oneliner..
>
> everything can be oneliner ;-)

But has to pass checkpatch.pl, so no more than 80 chars.

;-)

- Arnaldo

2020-09-14 22:01:16

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Em Mon, Sep 14, 2020 at 09:39:07PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> > > On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
> > > > Add new version of mmap event. The MMAP3 record is an
> > > > augmented version of MMAP2, it adds build id value to
> > > > identify the exact binary object behind memory map:

> > > > struct {
> > > > struct perf_event_header header;

> > > > u32 pid, tid;
> > > > u64 addr;
> > > > u64 len;
> > > > u64 pgoff;
> > > > u32 maj;
> > > > u32 min;
> > > > u64 ino;
> > > > u64 ino_generation;
> > > > u32 prot, flags;
> > > > u32 reserved;

> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).

> > > > u8 buildid[20];

> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..

> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.

> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.

> > Having said that, at this point I don't even know if adding new
> > PERF_RECORD_ that are an update for a preexisting one is the right way
> > to proceed.

> > Perhaps we should attach a BPF program to point where a mmap/munmap is
> > being done (perf_event_mmap()) and allow userspace to ask for whatever
> > it wants? With a kprobes there right now we can implement this MMAP3
> > easily, no?

> hmm, I'm always woried about solutions based on kprobes,
> because once the function is moved/removed you're screwed
> and need to keep up with function name changes and be
> backward compatible..

Well, I'm not advocating to have it as kprobes permanently, but we can
implement it now using a kprobes, i.e. systems wouldn't have to have its
kernel updated to have this feature, but once then need, for some other
reason, to have their kernel upgraded, then perf would notice that there
is a tracepoint for that and would happily use it.

So we would be able to use that tracepoint with things like ftrace,
bpftrace, everything that knows about tracepoints, and perf would get
build-ids and whatever else is needed to have a mmap record, in the
future we could even ask for some more (or less) according to the what
is needed for some new feature.

I.e. the point wasn't about kprobes was about using BPF to state what
we want to collect when a mmap is being put in place.

- Arnaldo

2020-09-14 22:40:07

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Em Mon, Sep 14, 2020 at 09:50:45PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 14, 2020 at 12:31:34PM -0300, Arnaldo Carvalho de Melo wrote:
>
> SNIP
>
> > > > ---
> > > > include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++-
> > > > kernel/events/core.c | 38 +++++++++++++++++++++++++++------
> > > > 2 files changed, 57 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > > > index 077e7ee69e3d..facfc3c673ed 100644
> > > > --- a/include/uapi/linux/perf_event.h
> > > > +++ b/include/uapi/linux/perf_event.h
> > > > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > > > aux_output : 1, /* generate AUX records instead of events */
> > > > cgroup : 1, /* include cgroup events */
> > > > text_poke : 1, /* include text poke events */
> > > > - __reserved_1 : 30;
> > > > + mmap3 : 1, /* include bpf events */
> > > > + __reserved_1 : 29;
> > > >
> > > what happens if I set mmap3 and mmap2?
> > >
> > > I think using mmap3 for every mmap may be overkill as you add useless
> > > 20 bytes to an mmap record.
> >
> > So use just PERF_RECORD_MMAP2.
> >
> > I think if the user says: I need buildids, then, in kernels with support
> > for getting the buildid in MMAP records, use it as its more accurate,
> > otherwise fall back to traversing all records at the end to go over lots
> > of files haversting those build-ids.
>
> ok, so special record option to enable this
>
> >
> > If the user says I don't want build-ids, nothing changes, no collection
> > at the end, perf continues using PERF_RECORD_MMAP2.
>
> and that's -B option in record

Yeah, so if -B is used, MMAP2, otherwise, the best available option,
which is MMAP3, which by now means more how you tweak the misc bits and
what you collect, buildids or just the maj/min/ino :)

- Arnaldo

> >
> > > I am not sure if your code handles the case where mmap3 is not needed
> > > because there is no buildid, e.g, anonymous memory.
> > > It seems to me you've written the patch in such a way that if the user
> > > tool supports mmap3, then it supersedes mmap2, and thus
> > > you need all the fields of mmap2. But if could be more interesting to
> > > return either MMAP2 or MMAP3 depending on tool support
> > > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > > But maybe that logic is already in your patch and I missed it.
> >
> > Right, it should take into account if the user asked for build-ids or
> > not in addition to checking if the kernel supports MMAP3.
>
> right, thanks,
>
> jirka
>

--

- Arnaldo

2020-09-15 00:05:20

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On Mon, Sep 14, 2020 at 10:26 AM Stephane Eranian <[email protected]> wrote:
>
> On Mon, Sep 14, 2020 at 2:08 AM <[email protected]> wrote:
> >
> > On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa <[email protected]> wrote:
> > > what happens if I set mmap3 and mmap2?
> > >
> > > I think using mmap3 for every mmap may be overkill as you add useless
> > > 20 bytes to an mmap record.
> > > I am not sure if your code handles the case where mmap3 is not needed
> > > because there is no buildid, e.g, anonymous memory.
> > > It seems to me you've written the patch in such a way that if the user
> > > tool supports mmap3, then it supersedes mmap2, and thus
> > > you need all the fields of mmap2. But if could be more interesting to
> > > return either MMAP2 or MMAP3 depending on tool support
> > > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > > But maybe that logic is already in your patch and I missed it.
> >
> > That, and what if you don't want any of that buildid nonsense at all? I
> > always kill that because it makes perf pointlessly slow and has
> > absolutely no upsides for me.
> >
> I have seen situations where the perf tool takes a visibly significant
> amount of time (many seconds) to inject the buildids at the end of the
> collection
> of perf record (same if using perf inject -b). That is because it
> needs to go through all the records in the perf.data to find MMAP
> records and then read
> the buildids from the filesystem. This has caused some problems in our
> environment. Having the kernel add the buildid to *relevant* mmaps
> would avoid
> a lot of that penalty, by avoiding having to parse the perf.data file
> and leveraging the fact that the buildid may be in memory already.
> Although my concern on
> this has to do with large pages and the impact they have on alignment
> of sections in memory. I think Ian can comment better on this.

I believe this is a problem we have and that is going away. For
context, we map huge pages and move executable code to them, not from
a file, but using anonymous memory or other sources of huge pages. By
definition we will fail to find build ids for such anonymous memory,
but we may also break the non file backed hugepage case if the
alignment is such that the ELF header is on the hugepage and for some
reason not in the page cache. File backed huge pages solve this
problem.

Thanks,
Ian

> I think this patch series is useful if it can demonstrate a speedup
> during recording (perf record or perf record | perf inject -b). But it
> needs to be
> optimized to minimize the volume of useless info returned. And Jiri
> needs to decide if MMAP3 is a replacement of MMAP2, or a different
> kind of record
> targeted at ELF images only in which case some of the fields may be
> removed. My tendency would be to go for the latter.

2020-09-15 02:56:05

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

Hi Jiri,

On Tue, Sep 15, 2020 at 4:38 AM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > > index 077e7ee69e3d..facfc3c673ed 100644
> > > --- a/include/uapi/linux/perf_event.h
> > > +++ b/include/uapi/linux/perf_event.h
> > > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > > aux_output : 1, /* generate AUX records instead of events */
> > > cgroup : 1, /* include cgroup events */
> > > text_poke : 1, /* include text poke events */
> > > - __reserved_1 : 30;
> > > + mmap3 : 1, /* include bpf events */
> >
> > ???
> >
> > > + __reserved_1 : 29;
> > >
> > > union {
> > > __u32 wakeup_events; /* wakeup every n events */
> > > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> > > */
> > > PERF_RECORD_TEXT_POKE = 20,
> > >
> > > + /*
> > > + * The MMAP3 records are an augmented version of MMAP2, they add
> > > + * build id value to identify the exact binary behind map
> > > + *
> > > + * struct {
> > > + * struct perf_event_header header;
> > > + *
> > > + * u32 pid, tid;
> > > + * u64 addr;
> > > + * u64 len;
> > > + * u64 pgoff;
> > > + * u32 maj;
> > > + * u32 min;
> > > + * u64 ino;
> > > + * u64 ino_generation;
> > > + * u32 prot, flags;
> > > + * u32 reserved;
> > > + * u8 buildid[20];
> > > + * char filename[];
> > > + * struct sample_id sample_id;
> > > + * };
> > > + */
> > > + PERF_RECORD_MMAP3 = 21,
> > > +
> > > PERF_RECORD_MAX, /* non-ABI */
> > > };
> > >
> > [SNIP]
> > > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
> > > mmap_event->prot = prot;
> > > mmap_event->flags = flags;
> > >
> > > + if (atomic_read(&nr_mmap3_events))
> > > + build_id_parse(vma, mmap_event->buildid);
> >
> > What about if it failed? We should zero out the build-id..
>
> it is initialized to zero in perf_event_mmap
>
> mmap_event = (struct perf_mmap_event){
> .vma = vma,
> ...
>
> I'll double check build_id_parse won't leave anything half
> baked there, but I dont think so

Oh, you're right. I missed that..

Thanks
Namhyung

2020-09-15 03:00:43

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 09/26] perf tools: Try load vmlinux from buildid database

On Tue, Sep 15, 2020 at 5:29 AM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Sep 14, 2020 at 03:25:39PM +0900, Namhyung Kim wrote:
> > On Mon, Sep 14, 2020 at 6:04 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Currently we don't check on kernel's vmlinux the same way as
> > > we do for normal binaries, but we either look for kallsyms
> > > file in build id database or check on known vmlinux locations
> > > (plus some other optional paths).
> > >
> > > This patch adds the check for standard build id binary location,
> > > so we are ready once we start to store it there from debuginfod
> > > in following changes.
> >
> > But dso__load_vmlinux_path() already has the logic.
> > Also you should check symbol_conf.ignore_vmlinux_buildid.
>
> I wanted to have it not dependent on !symbol_conf.ignore_vmlinux
> which is needed to call dso__load_vmlinux_path

Note that it's a different config variable to suppress using build-id
when loading kernel DSO from perf record.

Thanks
Namhyung

>
> also the idea was to try the build id vmlinux before the configured
> vmlinux locations, because they found the vmlinux in my setup ;-)
>
> I'll double check the logic again
>
> thanks,
> jirka
>

2020-09-15 05:44:15

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On 15/09/20 1:00 am, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 14, 2020 at 09:39:07PM +0200, Jiri Olsa escreveu:
>> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
>>> Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
>>>> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa <[email protected]> wrote:
>>>>> Add new version of mmap event. The MMAP3 record is an
>>>>> augmented version of MMAP2, it adds build id value to
>>>>> identify the exact binary object behind memory map:
>
>>>>> struct {
>>>>> struct perf_event_header header;
>
>>>>> u32 pid, tid;
>>>>> u64 addr;
>>>>> u64 len;
>>>>> u64 pgoff;
>>>>> u32 maj;
>>>>> u32 min;
>>>>> u64 ino;
>>>>> u64 ino_generation;
>>>>> u32 prot, flags;
>>>>> u32 reserved;
>
>>> What for this reserved? its all nicely aligned already, u64 followed by
>>> two u32 (prot, flags).
>
>>>>> u8 buildid[20];
>
>>>> Do we need maj, min, ino, ino_generation for mmap3 event?
>>>> I think they are to compare binaries, then we can do it with
>>>> build-id (and I think it'd be better)..
>
>>> Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
>>> superset of MMAP2.
>
>>> If we want to ditch useless stuff, then trow away pid, tid too, as we
>>> can select those via sample_type.
>
>>> Having said that, at this point I don't even know if adding new
>>> PERF_RECORD_ that are an update for a preexisting one is the right way
>>> to proceed.
>
>>> Perhaps we should attach a BPF program to point where a mmap/munmap is
>>> being done (perf_event_mmap()) and allow userspace to ask for whatever
>>> it wants? With a kprobes there right now we can implement this MMAP3
>>> easily, no?
>
>> hmm, I'm always woried about solutions based on kprobes,
>> because once the function is moved/removed you're screwed
>> and need to keep up with function name changes and be
>> backward compatible..
>
> Well, I'm not advocating to have it as kprobes permanently, but we can
> implement it now using a kprobes, i.e. systems wouldn't have to have its
> kernel updated to have this feature, but once then need, for some other
> reason, to have their kernel upgraded, then perf would notice that there
> is a tracepoint for that and would happily use it.
>
> So we would be able to use that tracepoint with things like ftrace,
> bpftrace, everything that knows about tracepoints, and perf would get
> build-ids and whatever else is needed to have a mmap record, in the
> future we could even ask for some more (or less) according to the what
> is needed for some new feature.
>
> I.e. the point wasn't about kprobes was about using BPF to state what
> we want to collect when a mmap is being put in place.

Isn't the problem with krpobes / tracepoints etc that non-privileged users
can't use them.

2020-09-15 05:53:01

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

On 14/09/20 11:07 pm, Jiri Olsa wrote:
> On Mon, Sep 14, 2020 at 10:08:01AM -0700, Ian Rogers wrote:
>
> SNIP
>
>>>
>>> Using one of the MISC bits to resolve the union. Might actually bring
>>> benefit to everyone. Us normal people get to have a smaller MMAP record,
>>> while the buildid folks can have it too.
>>>
>>> Even more extreme would be using 2 MISC bits and allowing the union to
>>> be 0 sized for anon.
>>>
>>> That said; I have the nagging feeling there were unresolved issues with
>>> mmap2, but I can't seem to find any relevant emails on it :/ My
>>> google-fu is weak today.
>>
>> Firstly, thanks Jiri for this really useful patch set for our
>> use-cases! Some thoughts:
>>
>> One issue with mmap2 events at the moment is that they happen "after"
>> the mmap is performed. This allows the mapped address to be known but
>> has the unfortunate problem of causing mmap events to get "extended"
>> due to adjacent vmas being merged. There are some workarounds like
>> commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
>> Perhaps these events can switch to reporting the length the mmap
>> requested rather than the length of the vma?
>
> seems like separate feature, perhaps we could use another MISC bit for that?
>
>>
>> I could imagine that changes here could be interesting to folks doing
>> CHERI or hypervisors, for example, they may want more address bits.
>>
>> BPF has stack traces with elements of buildid + offset. Using these in
>> perf samples would avoid the need for mmap events, synthesis, etc. but
>> at the cost of larger and possibly slower stack traces. Perhaps we
>> should just remove the idea of mmap events altogether?
>
> hm, we also need mmap events to resolve PERF_SAMPLE_IP, right?
> also not sure how would we do that for dwarf unwind, and perhaps
> there are some other places.. c2c data address resolving?

Not to mention Intel PT and any other hw trace that puts ip's into the AUX area.

And branch stacks, call chains.

2020-09-15 22:11:01

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

On Mon, Sep 14, 2020 at 9:08 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Sun, Sep 13, 2020 at 11:03:03PM +0200, Jiri Olsa escreveu:
> > Synthesizing modules with mmap3 events so we can
> > get build id data for module's maps as well.
>
> Ditto as for 15/26
>
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > tools/perf/util/synthetic-events.c | 37 +++++++++++++++++++-----------
> > 1 file changed, 24 insertions(+), 13 deletions(-)
> >
> > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > index bd6e7b84283d..6bd2423ce2f3 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -605,7 +605,7 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> > int rc = 0;
> > struct map *pos;
> > struct maps *maps = machine__kernel_maps(machine);
> > - union perf_event *event = zalloc((sizeof(event->mmap) +
> > + union perf_event *event = zalloc((sizeof(event->mmap3) +
> > machine->id_hdr_size));
> > if (event == NULL) {
> > pr_debug("Not enough memory synthesizing mmap event "
> > @@ -613,8 +613,6 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> > return -1;
> > }
> >
> > - event->header.type = PERF_RECORD_MMAP;
> > -
> > /*
> > * kernel uses 0 for user space maps, see kernel/perf_event.c
> > * __perf_event_mmap
> > @@ -631,17 +629,30 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> > continue;
> >
> > size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
> > - event->mmap.header.type = PERF_RECORD_MMAP;
> > - event->mmap.header.size = (sizeof(event->mmap) -
> > - (sizeof(event->mmap.filename) - size));
> > - memset(event->mmap.filename + size, 0, machine->id_hdr_size);
> > - event->mmap.header.size += machine->id_hdr_size;
> > - event->mmap.start = pos->start;
> > - event->mmap.len = pos->end - pos->start;
> > - event->mmap.pid = machine->pid;
> > -
> > - memcpy(event->mmap.filename, pos->dso->long_name,
> > + event->mmap3.header.type = PERF_RECORD_MMAP3;
> > + event->mmap3.header.size = (sizeof(event->mmap3) -
> > + (sizeof(event->mmap3.filename) - size));
> > + memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
> > + event->mmap3.header.size += machine->id_hdr_size;
> > + event->mmap3.start = pos->start;
> > + event->mmap3.len = pos->end - pos->start;
> > + event->mmap3.pid = machine->pid;
> > +
> > + memcpy(event->mmap3.filename, pos->dso->long_name,
> > pos->dso->long_name_len + 1);
> > +
> > + rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
> > + BUILD_ID_SIZE);
> > + if (rc != BUILD_ID_SIZE) {

IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
Should this test just be "> 0" ?

Thanks,
Ian

> > + if (event->mmap3.filename[0] == '/') {
> > + pr_debug2("Failed to read build ID for %s\n",
> > + event->mmap3.filename);
> > + }
> > + memset(event->mmap3.buildid, 0x0, sizeof(event->mmap3.buildid));
> > + }
> > +
> > + rc = 0;
> > +
> > if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
> > rc = -1;
> > break;
> > --
> > 2.26.2
> >
>
> --
>
> - Arnaldo

2020-09-16 08:21:36

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

On Tue, Sep 15, 2020 at 01:17:44PM -0700, Ian Rogers wrote:

SNIP

> > > /*
> > > * kernel uses 0 for user space maps, see kernel/perf_event.c
> > > * __perf_event_mmap
> > > @@ -631,17 +629,30 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> > > continue;
> > >
> > > size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
> > > - event->mmap.header.type = PERF_RECORD_MMAP;
> > > - event->mmap.header.size = (sizeof(event->mmap) -
> > > - (sizeof(event->mmap.filename) - size));
> > > - memset(event->mmap.filename + size, 0, machine->id_hdr_size);
> > > - event->mmap.header.size += machine->id_hdr_size;
> > > - event->mmap.start = pos->start;
> > > - event->mmap.len = pos->end - pos->start;
> > > - event->mmap.pid = machine->pid;
> > > -
> > > - memcpy(event->mmap.filename, pos->dso->long_name,
> > > + event->mmap3.header.type = PERF_RECORD_MMAP3;
> > > + event->mmap3.header.size = (sizeof(event->mmap3) -
> > > + (sizeof(event->mmap3.filename) - size));
> > > + memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
> > > + event->mmap3.header.size += machine->id_hdr_size;
> > > + event->mmap3.start = pos->start;
> > > + event->mmap3.len = pos->end - pos->start;
> > > + event->mmap3.pid = machine->pid;
> > > +
> > > + memcpy(event->mmap3.filename, pos->dso->long_name,
> > > pos->dso->long_name_len + 1);
> > > +
> > > + rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
> > > + BUILD_ID_SIZE);
> > > + if (rc != BUILD_ID_SIZE) {
>
> IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> Should this test just be "> 0" ?

ah right, will check on that

thanks,
jirka

2020-09-16 17:10:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

On Wed, Sep 16, 2020 at 12:10:21PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 16, 2020 at 04:17:00PM +0200, [email protected] escreveu:
> > On Wed, Sep 16, 2020 at 11:07:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Sep 16, 2020 at 10:20:18AM +0200, Jiri Olsa escreveu:
> >
> > > > > IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> > > > > build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> > > > > Should this test just be "> 0" ?
> > > >
> > > > ah right, will check on that
> > >
> > > And how do you deal with this in the kernel? I.e. to inform userspace,
> > > via the PERF_RECORD_MMAP3 (or MMAP2 with that misc bit trick) the size
> > > of the build-id?
> >
> > The union size is 24 bytes, so there's plenty space to store a length
> > field with the buildid.
>
> So, I think we should instead use a bit in the misc field, stating the
> kind of build-id, so that we don't waste a byte for that, I think.

There's no wastage:

u32 min, maj;
u64 ino;
u64 ino_generation;

is 24 bytes, buildit is 20 bytes, that leaves us 4 bytes to encode the
buildid type without growing anything.

2020-09-16 17:47:47

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

Em Wed, Sep 16, 2020 at 10:20:18AM +0200, Jiri Olsa escreveu:
> On Tue, Sep 15, 2020 at 01:17:44PM -0700, Ian Rogers wrote:
>
> SNIP
>
> > > > /*
> > > > * kernel uses 0 for user space maps, see kernel/perf_event.c
> > > > * __perf_event_mmap
> > > > @@ -631,17 +629,30 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
> > > > continue;
> > > >
> > > > size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
> > > > - event->mmap.header.type = PERF_RECORD_MMAP;
> > > > - event->mmap.header.size = (sizeof(event->mmap) -
> > > > - (sizeof(event->mmap.filename) - size));
> > > > - memset(event->mmap.filename + size, 0, machine->id_hdr_size);
> > > > - event->mmap.header.size += machine->id_hdr_size;
> > > > - event->mmap.start = pos->start;
> > > > - event->mmap.len = pos->end - pos->start;
> > > > - event->mmap.pid = machine->pid;
> > > > -
> > > > - memcpy(event->mmap.filename, pos->dso->long_name,
> > > > + event->mmap3.header.type = PERF_RECORD_MMAP3;
> > > > + event->mmap3.header.size = (sizeof(event->mmap3) -
> > > > + (sizeof(event->mmap3.filename) - size));
> > > > + memset(event->mmap3.filename + size, 0, machine->id_hdr_size);
> > > > + event->mmap3.header.size += machine->id_hdr_size;
> > > > + event->mmap3.start = pos->start;
> > > > + event->mmap3.len = pos->end - pos->start;
> > > > + event->mmap3.pid = machine->pid;
> > > > +
> > > > + memcpy(event->mmap3.filename, pos->dso->long_name,
> > > > pos->dso->long_name_len + 1);
> > > > +
> > > > + rc = filename__read_build_id(event->mmap3.filename, event->mmap3.buildid,
> > > > + BUILD_ID_SIZE);
> > > > + if (rc != BUILD_ID_SIZE) {
> >
> > IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> > build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> > Should this test just be "> 0" ?
>
> ah right, will check on that

And how do you deal with this in the kernel? I.e. to inform userspace,
via the PERF_RECORD_MMAP3 (or MMAP2 with that misc bit trick) the size
of the build-id?

- Arnaldo

2020-09-16 19:03:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

On Wed, Sep 16, 2020 at 11:07:44AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 16, 2020 at 10:20:18AM +0200, Jiri Olsa escreveu:

> > > IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> > > build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> > > Should this test just be "> 0" ?
> >
> > ah right, will check on that
>
> And how do you deal with this in the kernel? I.e. to inform userspace,
> via the PERF_RECORD_MMAP3 (or MMAP2 with that misc bit trick) the size
> of the build-id?

The union size is 24 bytes, so there's plenty space to store a length
field with the buildid.

2020-09-16 19:06:45

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

Em Wed, Sep 16, 2020 at 04:17:00PM +0200, [email protected] escreveu:
> On Wed, Sep 16, 2020 at 11:07:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Sep 16, 2020 at 10:20:18AM +0200, Jiri Olsa escreveu:
>
> > > > IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> > > > build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> > > > Should this test just be "> 0" ?
> > >
> > > ah right, will check on that
> >
> > And how do you deal with this in the kernel? I.e. to inform userspace,
> > via the PERF_RECORD_MMAP3 (or MMAP2 with that misc bit trick) the size
> > of the build-id?
>
> The union size is 24 bytes, so there's plenty space to store a length
> field with the buildid.

So, I think we should instead use a bit in the misc field, stating the
kind of build-id, so that we don't waste a byte for that, I think.

- Arnaldo

2020-09-16 20:11:58

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

On Wed, Sep 16, 2020 at 12:10:21PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 16, 2020 at 04:17:00PM +0200, [email protected] escreveu:
> > On Wed, Sep 16, 2020 at 11:07:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Sep 16, 2020 at 10:20:18AM +0200, Jiri Olsa escreveu:
> >
> > > > > IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> > > > > build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> > > > > Should this test just be "> 0" ?
> > > >
> > > > ah right, will check on that
> > >
> > > And how do you deal with this in the kernel? I.e. to inform userspace,
> > > via the PERF_RECORD_MMAP3 (or MMAP2 with that misc bit trick) the size
> > > of the build-id?
> >
> > The union size is 24 bytes, so there's plenty space to store a length
> > field with the buildid.
>
> So, I think we should instead use a bit in the misc field, stating the
> kind of build-id, so that we don't waste a byte for that, I think.

not sure there's too many misc bits left if there would be more
build id kinds

jirka

2020-09-16 21:02:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 16/26] perf tools: Synthesize modules with mmap3

Em Wed, Sep 16, 2020 at 05:21:23PM +0200, Jiri Olsa escreveu:
> On Wed, Sep 16, 2020 at 12:10:21PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Sep 16, 2020 at 04:17:00PM +0200, [email protected] escreveu:
> > > On Wed, Sep 16, 2020 at 11:07:44AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Wed, Sep 16, 2020 at 10:20:18AM +0200, Jiri Olsa escreveu:

> > > > > > IIRC BUILD_ID_SIZE is 20 bytes which is the correct size for SHA-1. A
> > > > > > build ID may be 128-bits (16 bytes) if md5 or uuid hashes are used.
> > > > > > Should this test just be "> 0" ?

> > > > > ah right, will check on that

> > > > And how do you deal with this in the kernel? I.e. to inform userspace,
> > > > via the PERF_RECORD_MMAP3 (or MMAP2 with that misc bit trick) the size
> > > > of the build-id?

> > > The union size is 24 bytes, so there's plenty space to store a length
> > > field with the buildid.

> > So, I think we should instead use a bit in the misc field, stating the
> > kind of build-id, so that we don't waste a byte for that, I think.

> not sure there's too many misc bits left if there would be more
> build id kinds

So, Ian mentioned a few types of build ids, if there are not that many
misc bits left for PERF_RECORD_MMAP2, then yeah, we can then use one
byte and use it as a PERF_RECORD_MMAP2 specific misc bits, me may want
to have something else in the future, like we're now reusing those ino,
etc, to instead store a build-id.

But we're closing in to a final solution, and that is good. :-)

- Arnaldo

2020-09-17 19:00:19

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 04/26] perf tools: Add filename__decompress function

Em Mon, Sep 14, 2020 at 10:43:26PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 14, 2020 at 12:35:54PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sun, Sep 13, 2020 at 11:02:51PM +0200, Jiri Olsa escreveu:
> > > Factor filename__decompress from decompress_kmodule function.
> > > It can decompress files with compressions supported in perf -
> > > xz and gz, the support needs to be compiled in.
> > >
> > > It will to be used in following changes to get build id out of
> > > compressed elf objects.
> >
> > This is prep work, can be applied now, done.
>
> thanks,
> jirka

So, I take that back, one of these decompress patches is causing this:

[root@five ~]# perf list syscalls:sys_enter_open |& tail
lzma: fopen failed on /usr/lib/modules/5.6.19-200.fc31.x86_64/kernel/drivers/acpi/video.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.5.8-200.fc31.x86_64/kernel/net/ipv4/netfilter/nf_reject_ipv4.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.5.9-200.fc31.x86_64/kernel/net/ipv6/netfilter/ip6_tables.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.5.5-200.fc31.x86_64/kernel/drivers/crypto/ccp/ccp.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.4.20-200.fc31.x86_64/kernel/sound/pci/hda/snd-hda-codec.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/target/target_core_mod.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.3.7-301.fc31.x86_64/kernel/drivers/iommu/amd_iommu_v2.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.3.7-301.fc31.x86_64/kernel/drivers/media/v4l2-core/videodev.ko.xz: 'No such file or directory'
lzma: fopen failed on /usr/lib/modules/5.6.19-200.fc31.x86_64/kernel/net/ipv4/netfilter/iptable_filter.ko.xz: 'No such file or directory'
syscalls:sys_enter_open [Tracepoint event]
[root@five ~]# perf test 78
78: Check open filename arg using perf trace + vfs_getname : FAILED!
[root@five ~]#
[root@five ~]# uname -a
Linux five 5.9.0-rc3 #1 SMP Mon Aug 31 08:38:27 -03 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@five ~]#

So I removed them from my local branch, I'll rerun the build tests and
then push perf/core when all tests pass.

The test uses 'perf probe' and I noticed it when processing Masami's
debuginfod patches to make 'perf probe' use it, I thought it was his
patches, looking only at the 'perf test 78' output, but ended up being
the decompress ones.

- Arnaldo

2020-09-18 10:52:39

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH 04/26] perf tools: Add filename__decompress function

On Thu, Sep 17, 2020 at 03:54:55PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 14, 2020 at 10:43:26PM +0200, Jiri Olsa escreveu:
> > On Mon, Sep 14, 2020 at 12:35:54PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Sun, Sep 13, 2020 at 11:02:51PM +0200, Jiri Olsa escreveu:
> > > > Factor filename__decompress from decompress_kmodule function.
> > > > It can decompress files with compressions supported in perf -
> > > > xz and gz, the support needs to be compiled in.
> > > >
> > > > It will to be used in following changes to get build id out of
> > > > compressed elf objects.
> > >
> > > This is prep work, can be applied now, done.
> >
> > thanks,
> > jirka
>
> So, I take that back, one of these decompress patches is causing this:
>
> [root@five ~]# perf list syscalls:sys_enter_open |& tail
> lzma: fopen failed on /usr/lib/modules/5.6.19-200.fc31.x86_64/kernel/drivers/acpi/video.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.5.8-200.fc31.x86_64/kernel/net/ipv4/netfilter/nf_reject_ipv4.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.5.9-200.fc31.x86_64/kernel/net/ipv6/netfilter/ip6_tables.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.5.5-200.fc31.x86_64/kernel/drivers/crypto/ccp/ccp.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.4.20-200.fc31.x86_64/kernel/sound/pci/hda/snd-hda-codec.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.5.15-200.fc31.x86_64/kernel/drivers/target/target_core_mod.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.3.7-301.fc31.x86_64/kernel/drivers/iommu/amd_iommu_v2.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.3.7-301.fc31.x86_64/kernel/drivers/media/v4l2-core/videodev.ko.xz: 'No such file or directory'
> lzma: fopen failed on /usr/lib/modules/5.6.19-200.fc31.x86_64/kernel/net/ipv4/netfilter/iptable_filter.ko.xz: 'No such file or directory'
> syscalls:sys_enter_open [Tracepoint event]
> [root@five ~]# perf test 78
> 78: Check open filename arg using perf trace + vfs_getname : FAILED!
> [root@five ~]#
> [root@five ~]# uname -a
> Linux five 5.9.0-rc3 #1 SMP Mon Aug 31 08:38:27 -03 2020 x86_64 x86_64 x86_64 GNU/Linux
> [root@five ~]#
>
> So I removed them from my local branch, I'll rerun the build tests and
> then push perf/core when all tests pass.
>
> The test uses 'perf probe' and I noticed it when processing Masami's
> debuginfod patches to make 'perf probe' use it, I thought it was his
> patches, looking only at the 'perf test 78' output, but ended up being
> the decompress ones.

ok, will check on that

thanks,
jirka