2023-10-24 22:24:33

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 00/50] Improvements to memory use

Fix memory leaks detected by address/leak sanitizer affecting LBR
call-graphs, perf mem and BPF offcpu.

Make branch_type_stat in callchain_list optional as it is large and
not always necessary - in particular it isn't used by perf top.

Make the allocations of zstd streams, kernel symbols and event copies
lazier in order to save memory in cases like perf record.

Handle the thread exit event and have it remove the thread from the
threads set in machine. Don't do this for perf report as it causes a
regression for task lists, which assume threads are never removed from
the machine's set, and offcpu events, that may sythensize samples for
threads that have exited.

Avoid using 8kb buffers for filename__read_str which is excessive for
reading CPU maps. Add io_dir as an allocation free readdir
replacement, opendir allocating 32kb by default and the code uses it
recursively.

Shrink perf map using a two value byte to replace two function
pointers. Modify the implementation of maps to not use an rbtree as
the container for maps, instead use a sorted array. Improve locking
and reference counting issues. Similarly separate out and reimplement
threads to use a hashmap for lower memory consumption and faster look
up. The fixes a regression in memory usage where reference count
checking switched to using non-invasive tree nodes.

The overall effect is to reduce memory consumption significantly for
perf top - with call graphs enabled running longer before 1GB of
memory is consumed. For a perf record of 'true', the memory
consumption goes from 39912kb max resident to 20820kb max resident -
nearly halved. perf inject with -b of a system wide perf record of
'true' reduces the max resident by roughly 4.5%.

v3: Additional memory/speed improvements, in particular for maps and
threads. Address review comments from [email protected] and
[email protected].
v2: Add additional memory fixes on top of initial LBR and rc check
fixes.

Ian Rogers (50):
perf rwsem: Add debug mode that uses a mutex
perf machine: Avoid out of bounds LBR memory read
libperf rc_check: Make implicit enabling work for GCC
libperf rc_check: Add RC_CHK_EQUAL
perf hist: Add missing puts to hist__account_cycles
perf threads: Remove unused dead thread list
perf offcpu: Add missed btf_free
perf callchain: Make display use of branch_type_stat const
perf callchain: Make brtype_stat in callchain_list optional
perf callchain: Minor layout changes to callchain_list
perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit
perf record: Lazy load kernel symbols
libperf: Lazily allocate mmap event copy
perf mmap: Lazily initialize zstd streams
perf machine thread: Remove exited threads by default
tools api fs: Switch filename__read_str to use io.h
tools api fs: Avoid reading whole file for a 1 byte bool
tools lib api: Add io_dir an allocation free readdir alternative
perf maps: Switch modules tree walk to io_dir__readdir
perf record: Be lazier in allocating lost samples buffer
perf pmu: Switch to io_dir__readdir
perf bpf: Don't synthesize BPF events when disabled
perf header: Switch mem topology to io_dir__readdir
perf events: Remove scandir in thread synthesis
perf map: Simplify map_ip/unmap_ip and make map size smaller
perf maps: Move symbol maps functions to maps.c
perf thread: Add missing RC_CHK_ACCESS
perf maps: Add maps__for_each_map to call a function on each entry
perf maps: Add remove maps function to remove a map based on callback
perf debug: Expose debug file
perf maps: Refactor maps__fixup_overlappings
perf maps: Do simple merge if given map doesn't overlap
perf maps: Rename clone to copy from
perf maps: Add maps__load_first
perf maps: Add find next entry to give entry after the given map
perf maps: Reduce scope of map_rb_node and maps internals
perf maps: Fix up overlaps during fixup_end
perf maps: Switch from rbtree to lazily sorted array for addresses
perf maps: Get map before returning in maps__find
perf maps: Get map before returning in maps__find_by_name
perf maps: Get map before returning in maps__find_next_entry
perf maps: Hide maps internals
perf maps: Locking tidy up of nr_maps
perf dso: Reorder variables to save space in struct dso
perf report: Sort child tasks by tid
perf trace: Ignore thread hashing in summary
perf machine: Move fprintf to for_each loop and a callback
perf threads: Move threads to its own files
perf threads: Switch from rbtree to hashmap
perf threads: Reduce table size from 256 to 8

tools/lib/api/Makefile | 2 +-
tools/lib/api/fs/fs.c | 74 +-
tools/lib/api/io.h | 9 +-
tools/lib/api/io_dir.h | 72 ++
tools/lib/perf/include/internal/mmap.h | 2 +-
tools/lib/perf/include/internal/rc_check.h | 13 +-
tools/lib/perf/mmap.c | 9 +
tools/perf/arch/x86/tests/dwarf-unwind.c | 1 +
tools/perf/arch/x86/util/event.c | 103 +-
tools/perf/builtin-inject.c | 6 +
tools/perf/builtin-record.c | 57 +-
tools/perf/builtin-report.c | 246 ++--
tools/perf/builtin-sched.c | 2 +-
tools/perf/builtin-trace.c | 41 +-
tools/perf/tests/hists_link.c | 4 +-
tools/perf/tests/maps.c | 64 +-
tools/perf/tests/thread-maps-share.c | 9 +-
tools/perf/tests/vmlinux-kallsyms.c | 181 +--
tools/perf/util/Build | 2 +
tools/perf/util/bpf-event.c | 4 +
tools/perf/util/bpf_lock_contention.c | 10 +-
tools/perf/util/bpf_off_cpu.c | 10 +-
tools/perf/util/branch.c | 4 +-
tools/perf/util/branch.h | 4 +-
tools/perf/util/callchain.c | 76 +-
tools/perf/util/callchain.h | 18 +-
tools/perf/util/compress.h | 6 +-
tools/perf/util/debug.c | 22 +-
tools/perf/util/debug.h | 1 +
tools/perf/util/dso.h | 84 +-
tools/perf/util/event.c | 8 +-
tools/perf/util/header.c | 31 +-
tools/perf/util/hist.c | 32 +-
tools/perf/util/machine.c | 501 +++-----
tools/perf/util/machine.h | 31 +-
tools/perf/util/map.c | 21 +-
tools/perf/util/map.h | 83 +-
tools/perf/util/map_symbol.c | 15 +
tools/perf/util/map_symbol.h | 4 +
tools/perf/util/maps.c | 1239 +++++++++++++++-----
tools/perf/util/maps.h | 95 +-
tools/perf/util/mmap.c | 5 +-
tools/perf/util/mmap.h | 1 -
tools/perf/util/pmu.c | 48 +-
tools/perf/util/pmus.c | 30 +-
tools/perf/util/probe-event.c | 41 +-
tools/perf/util/rb_resort.h | 5 -
tools/perf/util/rwsem.c | 34 +
tools/perf/util/rwsem.h | 11 +
tools/perf/util/session.c | 5 +
tools/perf/util/sort.c | 2 +-
tools/perf/util/symbol-elf.c | 10 +-
tools/perf/util/symbol.c | 346 +-----
tools/perf/util/symbol.h | 1 -
tools/perf/util/symbol_conf.h | 4 +-
tools/perf/util/synthetic-events.c | 140 ++-
tools/perf/util/thread.c | 44 +-
tools/perf/util/thread.h | 20 +-
tools/perf/util/threads.c | 186 +++
tools/perf/util/threads.h | 35 +
tools/perf/util/unwind-libunwind-local.c | 36 +-
tools/perf/util/unwind-libunwind.c | 7 +-
tools/perf/util/vdso.c | 35 +-
tools/perf/util/zstd.c | 63 +-
64 files changed, 2549 insertions(+), 1756 deletions(-)
create mode 100644 tools/lib/api/io_dir.h
create mode 100644 tools/perf/util/map_symbol.c
create mode 100644 tools/perf/util/threads.c
create mode 100644 tools/perf/util/threads.h

--
2.42.0.758.gaed0368e0e-goog


2023-10-24 22:24:36

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 01/50] perf rwsem: Add debug mode that uses a mutex

Mutex error check will capture trying to take the lock recursively and
other problems that rwlock won't. At the expense of concurrency, adda
debug mode that uses a mutex in place of a rwsem.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/rwsem.c | 34 ++++++++++++++++++++++++++++++++++
tools/perf/util/rwsem.h | 11 +++++++++++
2 files changed, 45 insertions(+)

diff --git a/tools/perf/util/rwsem.c b/tools/perf/util/rwsem.c
index f3d29d8ddc99..5109167f27f7 100644
--- a/tools/perf/util/rwsem.c
+++ b/tools/perf/util/rwsem.c
@@ -2,32 +2,66 @@
#include "util.h"
#include "rwsem.h"

+#if RWS_ERRORCHECK
+#include "mutex.h"
+#endif
+
int init_rwsem(struct rw_semaphore *sem)
{
+#if RWS_ERRORCHECK
+ mutex_init(&sem->mtx);
+ return 0;
+#else
return pthread_rwlock_init(&sem->lock, NULL);
+#endif
}

int exit_rwsem(struct rw_semaphore *sem)
{
+#if RWS_ERRORCHECK
+ mutex_destroy(&sem->mtx);
+ return 0;
+#else
return pthread_rwlock_destroy(&sem->lock);
+#endif
}

int down_read(struct rw_semaphore *sem)
{
+#if RWS_ERRORCHECK
+ mutex_lock(&sem->mtx);
+ return 0;
+#else
return perf_singlethreaded ? 0 : pthread_rwlock_rdlock(&sem->lock);
+#endif
}

int up_read(struct rw_semaphore *sem)
{
+#if RWS_ERRORCHECK
+ mutex_unlock(&sem->mtx);
+ return 0;
+#else
return perf_singlethreaded ? 0 : pthread_rwlock_unlock(&sem->lock);
+#endif
}

int down_write(struct rw_semaphore *sem)
{
+#if RWS_ERRORCHECK
+ mutex_lock(&sem->mtx);
+ return 0;
+#else
return perf_singlethreaded ? 0 : pthread_rwlock_wrlock(&sem->lock);
+#endif
}

int up_write(struct rw_semaphore *sem)
{
+#if RWS_ERRORCHECK
+ mutex_unlock(&sem->mtx);
+ return 0;
+#else
return perf_singlethreaded ? 0 : pthread_rwlock_unlock(&sem->lock);
+#endif
}
diff --git a/tools/perf/util/rwsem.h b/tools/perf/util/rwsem.h
index 94565ad4d494..ef5cbc31d967 100644
--- a/tools/perf/util/rwsem.h
+++ b/tools/perf/util/rwsem.h
@@ -2,9 +2,20 @@
#define _PERF_RWSEM_H

#include <pthread.h>
+#include "mutex.h"
+
+/*
+ * Mutexes have additional error checking. Enable to use a mutex rather than a
+ * rwlock for debugging.
+ */
+#define RWS_ERRORCHECK 0

struct rw_semaphore {
+#if RWS_ERRORCHECK
+ struct mutex mtx;
+#else
pthread_rwlock_t lock;
+#endif
};

int init_rwsem(struct rw_semaphore *sem);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:24:51

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 02/50] perf machine: Avoid out of bounds LBR memory read

Running perf top with address sanitizer and "--call-graph=lbr" fails
due to reading sample 0 when no samples exist. Add a guard to prevent
this.

Fixes: e2b23483eb1d ("perf machine: Factor out lbr_callchain_add_lbr_ip()")
Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index addfae2f63ef..e0e2c4a943e4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2622,16 +2622,18 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
save_lbr_cursor_node(thread, cursor, i);
}

- /* Add LBR ip from first entries.to */
- ip = entries[0].to;
- flags = &entries[0].flags;
- *branch_from = entries[0].from;
- err = add_callchain_ip(thread, cursor, parent,
- root_al, &cpumode, ip,
- true, flags, NULL,
- *branch_from);
- if (err)
- return err;
+ if (lbr_nr > 0) {
+ /* Add LBR ip from first entries.to */
+ ip = entries[0].to;
+ flags = &entries[0].flags;
+ *branch_from = entries[0].from;
+ err = add_callchain_ip(thread, cursor, parent,
+ root_al, &cpumode, ip,
+ true, flags, NULL,
+ *branch_from);
+ if (err)
+ return err;
+ }

return 0;
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:24:52

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 05/50] perf hist: Add missing puts to hist__account_cycles

Caught using reference count checking on perf top with
"--call-graph=lbr". After this no memory leaks were detected.

Fixes: 57849998e2cd ("perf report: Add processing for cycle histograms")
Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/hist.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index cde0078e6c90..0aa7e231172a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2676,8 +2676,6 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,

/* If we have branch cycles always annotate them. */
if (bs && bs->nr && entries[0].flags.cycles) {
- int i;
-
bi = sample__resolve_bstack(sample, al);
if (bi) {
struct addr_map_symbol *prev = NULL;
@@ -2692,7 +2690,7 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
* Note that perf stores branches reversed from
* program order!
*/
- for (i = bs->nr - 1; i >= 0; i--) {
+ for (int i = bs->nr - 1; i >= 0; i--) {
addr_map_symbol__account_cycles(&bi[i].from,
nonany_branch_mode ? NULL : prev,
bi[i].flags.cycles);
@@ -2701,6 +2699,12 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
if (total_cycles)
*total_cycles += bi[i].flags.cycles;
}
+ for (unsigned int i = 0; i < bs->nr; i++) {
+ map__put(bi[i].to.ms.map);
+ maps__put(bi[i].to.ms.maps);
+ map__put(bi[i].from.ms.map);
+ maps__put(bi[i].from.ms.maps);
+ }
free(bi);
}
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:24:55

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 03/50] libperf rc_check: Make implicit enabling work for GCC

Make the implicit REFCOUNT_CHECKING robust to when building with GCC.

Fixes: 9be6ab181b7b ("libperf rc_check: Enable implicitly with sanitizers")
Signed-off-by: Ian Rogers <[email protected]>
---
tools/lib/perf/include/internal/rc_check.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/rc_check.h b/tools/lib/perf/include/internal/rc_check.h
index d5d771ccdc7b..e88a6d8a0b0f 100644
--- a/tools/lib/perf/include/internal/rc_check.h
+++ b/tools/lib/perf/include/internal/rc_check.h
@@ -9,8 +9,12 @@
* Enable reference count checking implicitly with leak checking, which is
* integrated into address sanitizer.
*/
-#if defined(LEAK_SANITIZER) || defined(ADDRESS_SANITIZER)
+#if defined(__SANITIZE_ADDRESS__) || defined(LEAK_SANITIZER) || defined(ADDRESS_SANITIZER)
#define REFCNT_CHECKING 1
+#elif defined(__has_feature)
+#if __has_feature(address_sanitizer) || __has_feature(leak_sanitizer)
+#define REFCNT_CHECKING 1
+#endif
#endif

/*
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:24:57

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 04/50] libperf rc_check: Add RC_CHK_EQUAL

Comparing pointers with reference count checking is tricky to avoid a
SEGV. Add a convenience macro to simplify and use.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/lib/perf/include/internal/rc_check.h | 7 +++++++
tools/perf/builtin-sched.c | 2 +-
tools/perf/tests/hists_link.c | 4 ++--
tools/perf/tests/thread-maps-share.c | 9 ++++-----
tools/perf/util/callchain.c | 2 +-
tools/perf/util/hist.c | 2 +-
tools/perf/util/machine.c | 4 ++--
tools/perf/util/sort.c | 2 +-
tools/perf/util/symbol.c | 4 ++--
9 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/tools/lib/perf/include/internal/rc_check.h b/tools/lib/perf/include/internal/rc_check.h
index e88a6d8a0b0f..f80ddfc80129 100644
--- a/tools/lib/perf/include/internal/rc_check.h
+++ b/tools/lib/perf/include/internal/rc_check.h
@@ -54,6 +54,9 @@
/* A put operation removing the indirection layer. */
#define RC_CHK_PUT(object) {}

+/* Pointer equality when the indirection may or may not be there. */
+#define RC_CHK_EQUAL(object1, object2) (object1 == object2)
+
#else

/* Replaces "struct foo" so that the pointer may be interposed. */
@@ -101,6 +104,10 @@
} \
} while(0)

+/* Pointer equality when the indirection may or may not be there. */
+#define RC_CHK_EQUAL(object1, object2) (object1 == object2 || \
+ (object1 && object2 && object1->orig == object2->orig))
+
#endif

#endif /* __LIBPERF_INTERNAL_RC_CHECK_H */
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 9ab300b6f131..dd6065afbbaf 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -1385,7 +1385,7 @@ static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
{
pid_t l_tid, r_tid;

- if (RC_CHK_ACCESS(l->thread) == RC_CHK_ACCESS(r->thread))
+ if (RC_CHK_EQUAL(l->thread, r->thread))
return 0;
l_tid = thread__tid(l->thread);
r_tid = thread__tid(r->thread);
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 2d19657ab5e0..5b6f1e883466 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -148,8 +148,8 @@ static int find_sample(struct sample *samples, size_t nr_samples,
struct thread *t, struct map *m, struct symbol *s)
{
while (nr_samples--) {
- if (RC_CHK_ACCESS(samples->thread) == RC_CHK_ACCESS(t) &&
- RC_CHK_ACCESS(samples->map) == RC_CHK_ACCESS(m) &&
+ if (RC_CHK_EQUAL(samples->thread, t) &&
+ RC_CHK_EQUAL(samples->map, m) &&
samples->sym == s)
return 1;
samples++;
diff --git a/tools/perf/tests/thread-maps-share.c b/tools/perf/tests/thread-maps-share.c
index faf980b26252..7fa6f7c568e2 100644
--- a/tools/perf/tests/thread-maps-share.c
+++ b/tools/perf/tests/thread-maps-share.c
@@ -46,9 +46,9 @@ static int test__thread_maps_share(struct test_suite *test __maybe_unused, int s
TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(maps__refcnt(maps)), 4);

/* test the maps pointer is shared */
- TEST_ASSERT_VAL("maps don't match", RC_CHK_ACCESS(maps) == RC_CHK_ACCESS(thread__maps(t1)));
- TEST_ASSERT_VAL("maps don't match", RC_CHK_ACCESS(maps) == RC_CHK_ACCESS(thread__maps(t2)));
- TEST_ASSERT_VAL("maps don't match", RC_CHK_ACCESS(maps) == RC_CHK_ACCESS(thread__maps(t3)));
+ TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t1)));
+ TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t2)));
+ TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t3)));

/*
* Verify the other leader was created by previous call.
@@ -73,8 +73,7 @@ static int test__thread_maps_share(struct test_suite *test __maybe_unused, int s
other_maps = thread__maps(other);
TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(maps__refcnt(other_maps)), 2);

- TEST_ASSERT_VAL("maps don't match", RC_CHK_ACCESS(other_maps) ==
- RC_CHK_ACCESS(thread__maps(other_leader)));
+ TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(other_maps, thread__maps(other_leader)));

/* release thread group */
thread__put(t3);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index aee937d14fbb..18d545c0629e 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1142,7 +1142,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
if (al->map == NULL)
goto out;
}
- if (RC_CHK_ACCESS(al->maps) == RC_CHK_ACCESS(machine__kernel_maps(machine))) {
+ if (RC_CHK_EQUAL(al->maps, machine__kernel_maps(machine))) {
if (machine__is_host(machine)) {
al->cpumode = PERF_RECORD_MISC_KERNEL;
al->level = 'k';
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 3dc8a4968beb..cde0078e6c90 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2142,7 +2142,7 @@ static bool hists__filter_entry_by_thread(struct hists *hists,
struct hist_entry *he)
{
if (hists->thread_filter != NULL &&
- RC_CHK_ACCESS(he->thread) != RC_CHK_ACCESS(hists->thread_filter)) {
+ !RC_CHK_EQUAL(he->thread, hists->thread_filter)) {
he->filtered |= (1 << HIST_FILTER__THREAD);
return true;
}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index e0e2c4a943e4..098600d983c5 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -969,7 +969,7 @@ static int machine__process_ksymbol_unregister(struct machine *machine,
if (!map)
return 0;

- if (RC_CHK_ACCESS(map) != RC_CHK_ACCESS(machine->vmlinux_map))
+ if (!RC_CHK_EQUAL(map, machine->vmlinux_map))
maps__remove(machine__kernel_maps(machine), map);
else {
struct dso *dso = map__dso(map);
@@ -2058,7 +2058,7 @@ static void __machine__remove_thread(struct machine *machine, struct thread_rb_n
if (!nd)
nd = thread_rb_node__find(th, &threads->entries.rb_root);

- if (threads->last_match && RC_CHK_ACCESS(threads->last_match) == RC_CHK_ACCESS(th))
+ if (threads->last_match && RC_CHK_EQUAL(threads->last_match, th))
threads__set_last_match(threads, NULL);

if (lock)
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 6aa1c7f2b444..80e4f6132740 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -128,7 +128,7 @@ static int hist_entry__thread_filter(struct hist_entry *he, int type, const void
if (type != HIST_FILTER__THREAD)
return -1;

- return th && RC_CHK_ACCESS(he->thread) != RC_CHK_ACCESS(th);
+ return th && !RC_CHK_EQUAL(he->thread, th);
}

struct sort_entry sort_thread = {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 96587fd7a5a2..822f4dcebfe6 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -877,7 +877,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
*module++ = '\0';
curr_map_dso = map__dso(curr_map);
if (strcmp(curr_map_dso->short_name, module)) {
- if (RC_CHK_ACCESS(curr_map) != RC_CHK_ACCESS(initial_map) &&
+ if (!RC_CHK_EQUAL(curr_map, initial_map) &&
dso->kernel == DSO_SPACE__KERNEL_GUEST &&
machine__is_default_guest(machine)) {
/*
@@ -1469,7 +1469,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,

list_del_init(&new_node->node);

- if (RC_CHK_ACCESS(new_map) == RC_CHK_ACCESS(replacement_map)) {
+ if (RC_CHK_EQUAL(new_map, replacement_map)) {
struct map *map_ref;

map__set_start(map, map__start(new_map));
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:07

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 06/50] perf threads: Remove unused dead thread list

Commit 40826c45eb0b ("perf thread: Remove notion of dead threads")
removed dead threads but the list head wasn't removed. Remove it here.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 1 -
tools/perf/util/machine.h | 1 -
2 files changed, 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 098600d983c5..31fdbb0f27af 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -67,7 +67,6 @@ static void machine__threads_init(struct machine *machine)
threads->entries = RB_ROOT_CACHED;
init_rwsem(&threads->lock);
threads->nr = 0;
- INIT_LIST_HEAD(&threads->dead);
threads->last_match = NULL;
}
}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d034ecaf89c1..1279acda6a8a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -35,7 +35,6 @@ struct threads {
struct rb_root_cached entries;
struct rw_semaphore lock;
unsigned int nr;
- struct list_head dead;
struct thread *last_match;
};

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:08

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 08/50] perf callchain: Make display use of branch_type_stat const

Display code doesn't modify the branch_type_stat so switch uses to
const. This is done to aid refactoring struct callchain_list where
current the branch_type_stat is embedded even if not used.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/branch.c | 4 ++--
tools/perf/util/branch.h | 4 ++--
tools/perf/util/callchain.c | 6 +++---
3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/branch.c b/tools/perf/util/branch.c
index 378f16a24751..ab760e267d41 100644
--- a/tools/perf/util/branch.c
+++ b/tools/perf/util/branch.c
@@ -109,7 +109,7 @@ const char *get_branch_type(struct branch_entry *e)
return branch_type_name(e->flags.type);
}

-void branch_type_stat_display(FILE *fp, struct branch_type_stat *st)
+void branch_type_stat_display(FILE *fp, const struct branch_type_stat *st)
{
u64 total = 0;
int i;
@@ -171,7 +171,7 @@ static int count_str_scnprintf(int idx, const char *str, char *bf, int size)
return scnprintf(bf, size, "%s%s", (idx) ? " " : " (", str);
}

-int branch_type_str(struct branch_type_stat *st, char *bf, int size)
+int branch_type_str(const struct branch_type_stat *st, char *bf, int size)
{
int i, j = 0, printed = 0;
u64 total = 0;
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index e41bfffe2217..87704d713ff6 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -86,8 +86,8 @@ void branch_type_count(struct branch_type_stat *st, struct branch_flags *flags,
const char *branch_type_name(int type);
const char *branch_new_type_name(int new_type);
const char *get_branch_type(struct branch_entry *e);
-void branch_type_stat_display(FILE *fp, struct branch_type_stat *st);
-int branch_type_str(struct branch_type_stat *st, char *bf, int bfsize);
+void branch_type_stat_display(FILE *fp, const struct branch_type_stat *st);
+int branch_type_str(const struct branch_type_stat *st, char *bf, int bfsize);

const char *branch_spec_desc(int spec);

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 18d545c0629e..cde4860e6f28 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1339,7 +1339,7 @@ static int count_float_printf(int idx, const char *str, float value,
static int branch_to_str(char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count,
- struct branch_type_stat *brtype_stat)
+ const struct branch_type_stat *brtype_stat)
{
int printed, i = 0;

@@ -1403,7 +1403,7 @@ static int counts_str_build(char *bf, int bfsize,
u64 abort_count, u64 cycles_count,
u64 iter_count, u64 iter_cycles,
u64 from_count,
- struct branch_type_stat *brtype_stat)
+ const struct branch_type_stat *brtype_stat)
{
int printed;

@@ -1430,7 +1430,7 @@ static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
u64 abort_count, u64 cycles_count,
u64 iter_count, u64 iter_cycles,
u64 from_count,
- struct branch_type_stat *brtype_stat)
+ const struct branch_type_stat *brtype_stat)
{
char str[256];

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:24

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 10/50] perf callchain: Minor layout changes to callchain_list

Avoid 6 byte hole for padding. Place more frequently used fields
first in an attempt to use just 1 cacheline in the common case.

Before:
```
struct callchain_list {
u64 ip; /* 0 8 */
struct map_symbol ms; /* 8 24 */
struct {
_Bool unfolded; /* 32 1 */
_Bool has_children; /* 33 1 */
}; /* 32 2 */

/* XXX 6 bytes hole, try to pack */

u64 branch_count; /* 40 8 */
u64 from_count; /* 48 8 */
u64 predicted_count; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 abort_count; /* 64 8 */
u64 cycles_count; /* 72 8 */
u64 iter_count; /* 80 8 */
u64 iter_cycles; /* 88 8 */
struct branch_type_stat * brtype_stat; /* 96 8 */
const char * srcline; /* 104 8 */
struct list_head list; /* 112 16 */

/* size: 128, cachelines: 2, members: 13 */
/* sum members: 122, holes: 1, sum holes: 6 */
};
```

After:
```
struct callchain_list {
struct list_head list; /* 0 16 */
u64 ip; /* 16 8 */
struct map_symbol ms; /* 24 24 */
const char * srcline; /* 48 8 */
u64 branch_count; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 from_count; /* 64 8 */
u64 cycles_count; /* 72 8 */
u64 iter_count; /* 80 8 */
u64 iter_cycles; /* 88 8 */
struct branch_type_stat * brtype_stat; /* 96 8 */
u64 predicted_count; /* 104 8 */
u64 abort_count; /* 112 8 */
struct {
_Bool unfolded; /* 120 1 */
_Bool has_children; /* 121 1 */
}; /* 120 2 */

/* size: 128, cachelines: 2, members: 13 */
/* padding: 6 */
};
```

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/callchain.h | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 86e8a9e81456..d5c66345ae31 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -116,22 +116,22 @@ extern struct callchain_param callchain_param;
extern struct callchain_param callchain_param_default;

struct callchain_list {
+ struct list_head list;
u64 ip;
struct map_symbol ms;
- struct /* for TUI */ {
- bool unfolded;
- bool has_children;
- };
+ const char *srcline;
u64 branch_count;
u64 from_count;
- u64 predicted_count;
- u64 abort_count;
u64 cycles_count;
u64 iter_count;
u64 iter_cycles;
struct branch_type_stat *brtype_stat;
- const char *srcline;
- struct list_head list;
+ u64 predicted_count;
+ u64 abort_count;
+ struct /* for TUI */ {
+ bool unfolded;
+ bool has_children;
+ };
};

/*
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:28

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 12/50] perf record: Lazy load kernel symbols

Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
changed it so that loading a kernel dso would cause the symbols for
the dso to be eagerly loaded. For perf record this is overhead as the
symbols won't be used. Add a symbol_conf to control the behavior and
disable it for perf record and perf inject.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-inject.c | 6 ++++++
tools/perf/builtin-record.c | 2 ++
tools/perf/util/event.c | 4 ++--
tools/perf/util/symbol_conf.h | 3 ++-
4 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index c8cf2fdd9cff..eb3ef5c24b66 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -2265,6 +2265,12 @@ int cmd_inject(int argc, const char **argv)
"perf inject [<options>]",
NULL
};
+
+ if (!inject.itrace_synth_opts.set) {
+ /* Disable eager loading of kernel symbols that adds overhead to perf inject. */
+ symbol_conf.lazy_load_kernel_maps = true;
+ }
+
#ifndef HAVE_JITDUMP
set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
#endif
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index dcf288a4fb9a..8ec818568662 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
# undef set_nobuild
#endif

+ /* Disable eager loading of kernel symbols that adds overhead to perf record. */
+ symbol_conf.lazy_load_kernel_maps = true;
rec->opts.affinity = PERF_AFFINITY_SYS;

rec->evlist = evlist__new();
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 923c0fb15122..68f45e9e63b6 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
al->level = 'k';
maps = machine__kernel_maps(machine);
- load_map = true;
+ load_map = !symbol_conf.lazy_load_kernel_maps;
} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
al->level = '.';
} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
al->level = 'g';
maps = machine__kernel_maps(machine);
- load_map = true;
+ load_map = !symbol_conf.lazy_load_kernel_maps;
} else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
al->level = 'u';
} else {
diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 0b589570d1d0..2b2fb9e224b0 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -42,7 +42,8 @@ struct symbol_conf {
inline_name,
disable_add2line_warn,
buildid_mmap2,
- guest_code;
+ guest_code,
+ lazy_load_kernel_maps;
const char *vmlinux_name,
*kallsyms_name,
*source_prefix,
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:36

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 09/50] perf callchain: Make brtype_stat in callchain_list optional

struct callchain_list is 352bytes in size, 232 of which are
brtype_stat. brtype_stat is only used for certain callchain_list
items so make it optional, allocating when necessary. So that
printing doesn't need to deal with an optional brtype_stat, pass
an empty/zero version.

Before:
```
struct callchain_list {
u64 ip; /* 0 8 */
struct map_symbol ms; /* 8 24 */
struct {
_Bool unfolded; /* 32 1 */
_Bool has_children; /* 33 1 */
}; /* 32 2 */

/* XXX 6 bytes hole, try to pack */

u64 branch_count; /* 40 8 */
u64 from_count; /* 48 8 */
u64 predicted_count; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 abort_count; /* 64 8 */
u64 cycles_count; /* 72 8 */
u64 iter_count; /* 80 8 */
u64 iter_cycles; /* 88 8 */
struct branch_type_stat brtype_stat; /* 96 232 */
/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
const char * srcline; /* 328 8 */
struct list_head list; /* 336 16 */

/* size: 352, cachelines: 6, members: 13 */
/* sum members: 346, holes: 1, sum holes: 6 */
/* last cacheline: 32 bytes */
};
```

After:
```
struct callchain_list {
u64 ip; /* 0 8 */
struct map_symbol ms; /* 8 24 */
struct {
_Bool unfolded; /* 32 1 */
_Bool has_children; /* 33 1 */
}; /* 32 2 */

/* XXX 6 bytes hole, try to pack */

u64 branch_count; /* 40 8 */
u64 from_count; /* 48 8 */
u64 predicted_count; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 abort_count; /* 64 8 */
u64 cycles_count; /* 72 8 */
u64 iter_count; /* 80 8 */
u64 iter_cycles; /* 88 8 */
struct branch_type_stat * brtype_stat; /* 96 8 */
const char * srcline; /* 104 8 */
struct list_head list; /* 112 16 */

/* size: 128, cachelines: 2, members: 13 */
/* sum members: 122, holes: 1, sum holes: 6 */
};
```

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/callchain.c | 41 +++++++++++++++++++++++++++++--------
tools/perf/util/callchain.h | 2 +-
2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index cde4860e6f28..5349c6a21849 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -586,7 +586,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call = zalloc(sizeof(*call));
if (!call) {
perror("not enough memory for the code path tree");
- return -1;
+ return -ENOMEM;
}
call->ip = cursor_node->ip;
call->ms = cursor_node->ms;
@@ -602,7 +602,15 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
* branch_from is set with value somewhere else
* to imply it's "to" of a branch.
*/
- call->brtype_stat.branch_to = true;
+ if (!call->brtype_stat) {
+ call->brtype_stat = zalloc(sizeof(*call->brtype_stat));
+ if (!call->brtype_stat) {
+ perror("not enough memory for the code path branch statisitcs");
+ free(call->brtype_stat);
+ return -ENOMEM;
+ }
+ }
+ call->brtype_stat->branch_to = true;

if (cursor_node->branch_flags.predicted)
call->predicted_count = 1;
@@ -610,7 +618,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
if (cursor_node->branch_flags.abort)
call->abort_count = 1;

- branch_type_count(&call->brtype_stat,
+ branch_type_count(call->brtype_stat,
&cursor_node->branch_flags,
cursor_node->branch_from,
cursor_node->ip);
@@ -618,7 +626,8 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
/*
* It's "from" of a branch
*/
- call->brtype_stat.branch_to = false;
+ if (call->brtype_stat && call->brtype_stat->branch_to)
+ call->brtype_stat->branch_to = false;
call->cycles_count =
cursor_node->branch_flags.cycles;
call->iter_count = cursor_node->nr_loop_iter;
@@ -652,6 +661,7 @@ add_child(struct callchain_node *parent,
list_del_init(&call->list);
map__zput(call->ms.map);
maps__zput(call->ms.maps);
+ zfree(&call->brtype_stat);
free(call);
}
free(new);
@@ -762,7 +772,14 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
/*
* It's "to" of a branch
*/
- cnode->brtype_stat.branch_to = true;
+ if (!cnode->brtype_stat) {
+ cnode->brtype_stat = zalloc(sizeof(*cnode->brtype_stat));
+ if (!cnode->brtype_stat) {
+ perror("not enough memory for the code path branch statisitcs");
+ return MATCH_ERROR;
+ }
+ }
+ cnode->brtype_stat->branch_to = true;

if (node->branch_flags.predicted)
cnode->predicted_count++;
@@ -770,7 +787,7 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
if (node->branch_flags.abort)
cnode->abort_count++;

- branch_type_count(&cnode->brtype_stat,
+ branch_type_count(cnode->brtype_stat,
&node->branch_flags,
node->branch_from,
node->ip);
@@ -778,7 +795,8 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
/*
* It's "from" of a branch
*/
- cnode->brtype_stat.branch_to = false;
+ if (cnode->brtype_stat && cnode->brtype_stat->branch_to)
+ cnode->brtype_stat->branch_to = false;
cnode->cycles_count += node->branch_flags.cycles;
cnode->iter_count += node->nr_loop_iter;
cnode->iter_cycles += node->iter_cycles;
@@ -1026,6 +1044,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
maps__zput(ms.maps);
map__zput(list->ms.map);
maps__zput(list->ms.maps);
+ zfree(&list->brtype_stat);
free(list);
}

@@ -1447,11 +1466,14 @@ static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
int callchain_list_counts__printf_value(struct callchain_list *clist,
FILE *fp, char *bf, int bfsize)
{
+ static const struct branch_type_stat empty_brtype_stat = {};
+ const struct branch_type_stat *brtype_stat;
u64 branch_count, predicted_count;
u64 abort_count, cycles_count;
u64 iter_count, iter_cycles;
u64 from_count;

+ brtype_stat = clist->brtype_stat ?: &empty_brtype_stat;
branch_count = clist->branch_count;
predicted_count = clist->predicted_count;
abort_count = clist->abort_count;
@@ -1463,7 +1485,7 @@ int callchain_list_counts__printf_value(struct callchain_list *clist,
return callchain_counts_printf(fp, bf, bfsize, branch_count,
predicted_count, abort_count,
cycles_count, iter_count, iter_cycles,
- from_count, &clist->brtype_stat);
+ from_count, brtype_stat);
}

static void free_callchain_node(struct callchain_node *node)
@@ -1476,6 +1498,7 @@ static void free_callchain_node(struct callchain_node *node)
list_del_init(&list->list);
map__zput(list->ms.map);
maps__zput(list->ms.maps);
+ zfree(&list->brtype_stat);
free(list);
}

@@ -1483,6 +1506,7 @@ static void free_callchain_node(struct callchain_node *node)
list_del_init(&list->list);
map__zput(list->ms.map);
maps__zput(list->ms.maps);
+ zfree(&list->brtype_stat);
free(list);
}

@@ -1569,6 +1593,7 @@ int callchain_node__make_parent_list(struct callchain_node *node)
list_del_init(&chain->list);
map__zput(chain->ms.map);
maps__zput(chain->ms.maps);
+ zfree(&chain->brtype_stat);
free(chain);
}
return -ENOMEM;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index d2618a47deca..86e8a9e81456 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -129,7 +129,7 @@ struct callchain_list {
u64 cycles_count;
u64 iter_count;
u64 iter_cycles;
- struct branch_type_stat brtype_stat;
+ struct branch_type_stat *brtype_stat;
const char *srcline;
struct list_head list;
};
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:37

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 07/50] perf offcpu: Add missed btf_free

Caught by address/leak sanitizer.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/bpf_off_cpu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 21f4d9ba023d..6af36142dc5a 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -98,22 +98,22 @@ static void off_cpu_finish(void *arg __maybe_unused)
/* v5.18 kernel added prev_state arg, so it needs to check the signature */
static void check_sched_switch_args(void)
{
- const struct btf *btf = btf__load_vmlinux_btf();
+ struct btf *btf = btf__load_vmlinux_btf();
const struct btf_type *t1, *t2, *t3;
u32 type_id;

type_id = btf__find_by_name_kind(btf, "btf_trace_sched_switch",
BTF_KIND_TYPEDEF);
if ((s32)type_id < 0)
- return;
+ goto cleanup;

t1 = btf__type_by_id(btf, type_id);
if (t1 == NULL)
- return;
+ goto cleanup;

t2 = btf__type_by_id(btf, t1->type);
if (t2 == NULL || !btf_is_ptr(t2))
- return;
+ goto cleanup;

t3 = btf__type_by_id(btf, t2->type);
/* btf_trace func proto has one more argument for the context */
@@ -121,6 +121,8 @@ static void check_sched_switch_args(void)
/* new format: pass prev_state as 4th arg */
skel->rodata->has_prev_state = true;
}
+cleanup:
+ btf__free(btf);
}

int off_cpu_prepare(struct evlist *evlist, struct target *target,
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:25:46

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 13/50] libperf: Lazily allocate mmap event copy

The event copy in the mmap is used to have storage to a read
event. Not all users of mmaps read the events, such as perf record, so
switch the allocation to being on first read rather than being
embedded within the perf_mmap.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/lib/perf/include/internal/mmap.h | 2 +-
tools/lib/perf/mmap.c | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
index 5a062af8e9d8..b11aaf5ed645 100644
--- a/tools/lib/perf/include/internal/mmap.h
+++ b/tools/lib/perf/include/internal/mmap.h
@@ -33,7 +33,7 @@ struct perf_mmap {
bool overwrite;
u64 flush;
libperf_unmap_cb_t unmap_cb;
- char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
+ void *event_copy;
struct perf_mmap *next;
};

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 2184814b37dd..91ae46aac378 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,

void perf_mmap__munmap(struct perf_mmap *map)
{
+ free(map->event_copy);
+ map->event_copy = NULL;
if (map && map->base != NULL) {
munmap(map->base, perf_mmap__mmap_len(map));
map->base = NULL;
@@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
unsigned int len = min(sizeof(*event), size), cpy;
void *dst = map->event_copy;

+ if (!dst) {
+ dst = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!dst)
+ return NULL;
+ map->event_copy = dst;
+ }
+
do {
cpy = min(map->mask + 1 - (offset & map->mask), len);
memcpy(dst, &data[offset & map->mask], cpy);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:04

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 20/50] perf record: Be lazier in allocating lost samples buffer

Wait until a lost sample occurs to allocate the lost samples buffer,
often the buffer isn't necessary. This saves a 64kb allocation and
5.3kb of peak memory consumption.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9b4f3805ca92..b6c8c1371b39 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
static void record__read_lost_samples(struct record *rec)
{
struct perf_session *session = rec->session;
- struct perf_record_lost_samples *lost;
+ struct perf_record_lost_samples *lost = NULL;
struct evsel *evsel;

/* there was an error during record__open */
if (session->evlist == NULL)
return;

- lost = zalloc(PERF_SAMPLE_MAX_SIZE);
- if (lost == NULL) {
- pr_debug("Memory allocation failed\n");
- return;
- }
-
- lost->header.type = PERF_RECORD_LOST_SAMPLES;
-
evlist__for_each_entry(session->evlist, evsel) {
struct xyarray *xy = evsel->core.sample_id;
u64 lost_count;
@@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
}

if (count.lost) {
+ if (!lost) {
+ lost = zalloc(PERF_SAMPLE_MAX_SIZE);
+ if (!lost) {
+ pr_debug("Memory allocation failed\n");
+ return;
+ }
+ lost->header.type = PERF_RECORD_LOST_SAMPLES;
+ }
__record__save_lost_samples(rec, evsel, lost,
x, y, count.lost, 0);
}
@@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
}

lost_count = perf_bpf_filter__lost_count(evsel);
- if (lost_count)
+ if (lost_count) {
+ if (!lost) {
+ lost = zalloc(PERF_SAMPLE_MAX_SIZE);
+ if (!lost) {
+ pr_debug("Memory allocation failed\n");
+ return;
+ }
+ lost->header.type = PERF_RECORD_LOST_SAMPLES;
+ }
__record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
PERF_RECORD_MISC_LOST_SAMPLES_BPF);
+ }
}
out:
free(lost);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:05

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 14/50] perf mmap: Lazily initialize zstd streams

Zstd streams create dictionaries that can require significant RAM,
especially when there is one per-CPU. Tools like perf record won't use
the streams without the -z option, and so the creation of the streams
is pure overhead. Switch to creating the streams on first use.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-record.c | 26 ++++++++++-----
tools/perf/util/compress.h | 6 ++--
tools/perf/util/mmap.c | 5 ++-
tools/perf/util/mmap.h | 1 -
tools/perf/util/zstd.c | 63 +++++++++++++++++++------------------
5 files changed, 58 insertions(+), 43 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8ec818568662..9b4f3805ca92 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -270,7 +270,7 @@ static int record__write(struct record *rec, struct mmap *map __maybe_unused,

static int record__aio_enabled(struct record *rec);
static int record__comp_enabled(struct record *rec);
-static size_t zstd_compress(struct perf_session *session, struct mmap *map,
+static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
void *dst, size_t dst_size, void *src, size_t src_size);

#ifdef HAVE_AIO_SUPPORT
@@ -405,9 +405,13 @@ static int record__aio_pushfn(struct mmap *map, void *to, void *buf, size_t size
*/

if (record__comp_enabled(aio->rec)) {
- size = zstd_compress(aio->rec->session, NULL, aio->data + aio->size,
- mmap__mmap_len(map) - aio->size,
- buf, size);
+ ssize_t compressed = zstd_compress(aio->rec->session, NULL, aio->data + aio->size,
+ mmap__mmap_len(map) - aio->size,
+ buf, size);
+ if (compressed < 0)
+ return (int)compressed;
+
+ size = compressed;
} else {
memcpy(aio->data + aio->size, buf, size);
}
@@ -633,7 +637,13 @@ static int record__pushfn(struct mmap *map, void *to, void *bf, size_t size)
struct record *rec = to;

if (record__comp_enabled(rec)) {
- size = zstd_compress(rec->session, map, map->data, mmap__mmap_len(map), bf, size);
+ ssize_t compressed = zstd_compress(rec->session, map, map->data,
+ mmap__mmap_len(map), bf, size);
+
+ if (compressed < 0)
+ return (int)compressed;
+
+ size = compressed;
bf = map->data;
}

@@ -1527,10 +1537,10 @@ static size_t process_comp_header(void *record, size_t increment)
return size;
}

-static size_t zstd_compress(struct perf_session *session, struct mmap *map,
+static ssize_t zstd_compress(struct perf_session *session, struct mmap *map,
void *dst, size_t dst_size, void *src, size_t src_size)
{
- size_t compressed;
+ ssize_t compressed;
size_t max_record_size = PERF_SAMPLE_MAX_SIZE - sizeof(struct perf_record_compressed) - 1;
struct zstd_data *zstd_data = &session->zstd_data;

@@ -1539,6 +1549,8 @@ static size_t zstd_compress(struct perf_session *session, struct mmap *map,

compressed = zstd_compress_stream_to_records(zstd_data, dst, dst_size, src, src_size,
max_record_size, process_comp_header);
+ if (compressed < 0)
+ return compressed;

if (map && map->file) {
thread->bytes_transferred += src_size;
diff --git a/tools/perf/util/compress.h b/tools/perf/util/compress.h
index 0cd3369af2a4..9eb6eb5bf038 100644
--- a/tools/perf/util/compress.h
+++ b/tools/perf/util/compress.h
@@ -3,6 +3,7 @@
#define PERF_COMPRESS_H

#include <stdbool.h>
+#include <stdlib.h>
#ifdef HAVE_ZSTD_SUPPORT
#include <zstd.h>
#endif
@@ -21,6 +22,7 @@ struct zstd_data {
#ifdef HAVE_ZSTD_SUPPORT
ZSTD_CStream *cstream;
ZSTD_DStream *dstream;
+ int comp_level;
#endif
};

@@ -29,7 +31,7 @@ struct zstd_data {
int zstd_init(struct zstd_data *data, int level);
int zstd_fini(struct zstd_data *data);

-size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
+ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
void *src, size_t src_size, size_t max_record_size,
size_t process_header(void *record, size_t increment));

@@ -48,7 +50,7 @@ static inline int zstd_fini(struct zstd_data *data __maybe_unused)
}

static inline
-size_t zstd_compress_stream_to_records(struct zstd_data *data __maybe_unused,
+ssize_t zstd_compress_stream_to_records(struct zstd_data *data __maybe_unused,
void *dst __maybe_unused, size_t dst_size __maybe_unused,
void *src __maybe_unused, size_t src_size __maybe_unused,
size_t max_record_size __maybe_unused,
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c
index 49093b21ee2d..122ee198a86e 100644
--- a/tools/perf/util/mmap.c
+++ b/tools/perf/util/mmap.c
@@ -295,15 +295,14 @@ int mmap__mmap(struct mmap *map, struct mmap_params *mp, int fd, struct perf_cpu

map->core.flush = mp->flush;

- map->comp_level = mp->comp_level;
#ifndef PYTHON_PERF
- if (zstd_init(&map->zstd_data, map->comp_level)) {
+ if (zstd_init(&map->zstd_data, mp->comp_level)) {
pr_debug2("failed to init mmap compressor, error %d\n", errno);
return -1;
}
#endif

- if (map->comp_level && !perf_mmap__aio_enabled(map)) {
+ if (mp->comp_level && !perf_mmap__aio_enabled(map)) {
map->data = mmap(NULL, mmap__mmap_len(map), PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
if (map->data == MAP_FAILED) {
diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h
index f944c3cd5efa..0df6e1621c7e 100644
--- a/tools/perf/util/mmap.h
+++ b/tools/perf/util/mmap.h
@@ -39,7 +39,6 @@ struct mmap {
#endif
struct mmap_cpu_mask affinity_mask;
void *data;
- int comp_level;
struct perf_data_file *file;
struct zstd_data zstd_data;
};
diff --git a/tools/perf/util/zstd.c b/tools/perf/util/zstd.c
index 48dd2b018c47..57027e0ac7b6 100644
--- a/tools/perf/util/zstd.c
+++ b/tools/perf/util/zstd.c
@@ -7,35 +7,9 @@

int zstd_init(struct zstd_data *data, int level)
{
- size_t ret;
-
- data->dstream = ZSTD_createDStream();
- if (data->dstream == NULL) {
- pr_err("Couldn't create decompression stream.\n");
- return -1;
- }
-
- ret = ZSTD_initDStream(data->dstream);
- if (ZSTD_isError(ret)) {
- pr_err("Failed to initialize decompression stream: %s\n", ZSTD_getErrorName(ret));
- return -1;
- }
-
- if (!level)
- return 0;
-
- data->cstream = ZSTD_createCStream();
- if (data->cstream == NULL) {
- pr_err("Couldn't create compression stream.\n");
- return -1;
- }
-
- ret = ZSTD_initCStream(data->cstream, level);
- if (ZSTD_isError(ret)) {
- pr_err("Failed to initialize compression stream: %s\n", ZSTD_getErrorName(ret));
- return -1;
- }
-
+ data->comp_level = level;
+ data->dstream = NULL;
+ data->cstream = NULL;
return 0;
}

@@ -54,7 +28,7 @@ int zstd_fini(struct zstd_data *data)
return 0;
}

-size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
+ssize_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t dst_size,
void *src, size_t src_size, size_t max_record_size,
size_t process_header(void *record, size_t increment))
{
@@ -63,6 +37,21 @@ size_t zstd_compress_stream_to_records(struct zstd_data *data, void *dst, size_t
ZSTD_outBuffer output;
void *record;

+ if (!data->cstream) {
+ data->cstream = ZSTD_createCStream();
+ if (data->cstream == NULL) {
+ pr_err("Couldn't create compression stream.\n");
+ return -1;
+ }
+
+ ret = ZSTD_initCStream(data->cstream, data->comp_level);
+ if (ZSTD_isError(ret)) {
+ pr_err("Failed to initialize compression stream: %s\n",
+ ZSTD_getErrorName(ret));
+ return -1;
+ }
+ }
+
while (input.pos < input.size) {
record = dst;
size = process_header(record, 0);
@@ -96,6 +85,20 @@ size_t zstd_decompress_stream(struct zstd_data *data, void *src, size_t src_size
ZSTD_inBuffer input = { src, src_size, 0 };
ZSTD_outBuffer output = { dst, dst_size, 0 };

+ if (!data->dstream) {
+ data->dstream = ZSTD_createDStream();
+ if (data->dstream == NULL) {
+ pr_err("Couldn't create decompression stream.\n");
+ return 0;
+ }
+
+ ret = ZSTD_initDStream(data->dstream);
+ if (ZSTD_isError(ret)) {
+ pr_err("Failed to initialize decompression stream: %s\n",
+ ZSTD_getErrorName(ret));
+ return 0;
+ }
+ }
while (input.pos < input.size) {
ret = ZSTD_decompressStream(data->dstream, &output, &input);
if (ZSTD_isError(ret)) {
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:09

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 11/50] perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit

Fix leak where mem_info__put wouldn't release the maps/map as used by
perf mem. Add exit functions and use elsewhere that the maps and map
are released.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/Build | 1 +
tools/perf/util/callchain.c | 27 +++++++++------------------
tools/perf/util/hist.c | 28 ++++++++++++----------------
tools/perf/util/machine.c | 6 ++----
tools/perf/util/map_symbol.c | 15 +++++++++++++++
tools/perf/util/map_symbol.h | 4 ++++
tools/perf/util/symbol.c | 5 ++++-
7 files changed, 47 insertions(+), 39 deletions(-)
create mode 100644 tools/perf/util/map_symbol.c

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 0ea5a9d368d4..96058f949ec9 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -49,6 +49,7 @@ perf-y += dso.o
perf-y += dsos.o
perf-y += symbol.o
perf-y += symbol_fprintf.o
+perf-y += map_symbol.o
perf-y += color.o
perf-y += color_config.o
perf-y += metricgroup.o
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 5349c6a21849..99cce43ba152 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -659,8 +659,7 @@ add_child(struct callchain_node *parent,

list_for_each_entry_safe(call, tmp, &new->val, list) {
list_del_init(&call->list);
- map__zput(call->ms.map);
- maps__zput(call->ms.maps);
+ map_symbol__exit(&call->ms);
zfree(&call->brtype_stat);
free(call);
}
@@ -1040,10 +1039,8 @@ merge_chain_branch(struct callchain_cursor *cursor,
};
callchain_cursor_append(cursor, list->ip, &ms, false, NULL, 0, 0, 0, list->srcline);
list_del_init(&list->list);
- map__zput(ms.map);
- maps__zput(ms.maps);
- map__zput(list->ms.map);
- maps__zput(list->ms.maps);
+ map_symbol__exit(&ms);
+ map_symbol__exit(&list->ms);
zfree(&list->brtype_stat);
free(list);
}
@@ -1096,8 +1093,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
}

node->ip = ip;
- maps__zput(node->ms.maps);
- map__zput(node->ms.map);
+ map_symbol__exit(&node->ms);
node->ms = *ms;
node->ms.maps = maps__get(ms->maps);
node->ms.map = map__get(ms->map);
@@ -1496,16 +1492,14 @@ static void free_callchain_node(struct callchain_node *node)

list_for_each_entry_safe(list, tmp, &node->parent_val, list) {
list_del_init(&list->list);
- map__zput(list->ms.map);
- maps__zput(list->ms.maps);
+ map_symbol__exit(&list->ms);
zfree(&list->brtype_stat);
free(list);
}

list_for_each_entry_safe(list, tmp, &node->val, list) {
list_del_init(&list->list);
- map__zput(list->ms.map);
- maps__zput(list->ms.maps);
+ map_symbol__exit(&list->ms);
zfree(&list->brtype_stat);
free(list);
}
@@ -1591,8 +1585,7 @@ int callchain_node__make_parent_list(struct callchain_node *node)
out:
list_for_each_entry_safe(chain, new, &head, list) {
list_del_init(&chain->list);
- map__zput(chain->ms.map);
- maps__zput(chain->ms.maps);
+ map_symbol__exit(&chain->ms);
zfree(&chain->brtype_stat);
free(chain);
}
@@ -1676,10 +1669,8 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
cursor->nr = 0;
cursor->last = &cursor->first;

- for (node = cursor->first; node != NULL; node = node->next) {
- map__zput(node->ms.map);
- maps__zput(node->ms.maps);
- }
+ for (node = cursor->first; node != NULL; node = node->next)
+ map_symbol__exit(&node->ms);
}

void callchain_param_setup(u64 sample_type, const char *arch)
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 0aa7e231172a..0888b7163b7c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -515,17 +515,16 @@ static int hist_entry__init(struct hist_entry *he,

err_infos:
if (he->branch_info) {
- map__put(he->branch_info->from.ms.map);
- map__put(he->branch_info->to.ms.map);
+ map_symbol__exit(&he->branch_info->from.ms);
+ map_symbol__exit(&he->branch_info->to.ms);
zfree(&he->branch_info);
}
if (he->mem_info) {
- map__put(he->mem_info->iaddr.ms.map);
- map__put(he->mem_info->daddr.ms.map);
+ map_symbol__exit(&he->mem_info->iaddr.ms);
+ map_symbol__exit(&he->mem_info->daddr.ms);
}
err:
- maps__zput(he->ms.maps);
- map__zput(he->ms.map);
+ map_symbol__exit(&he->ms);
zfree(&he->stat_acc);
return -ENOMEM;
}
@@ -1317,20 +1316,19 @@ void hist_entry__delete(struct hist_entry *he)
struct hist_entry_ops *ops = he->ops;

thread__zput(he->thread);
- maps__zput(he->ms.maps);
- map__zput(he->ms.map);
+ map_symbol__exit(&he->ms);

if (he->branch_info) {
- map__zput(he->branch_info->from.ms.map);
- map__zput(he->branch_info->to.ms.map);
+ map_symbol__exit(&he->branch_info->from.ms);
+ map_symbol__exit(&he->branch_info->to.ms);
zfree_srcline(&he->branch_info->srcline_from);
zfree_srcline(&he->branch_info->srcline_to);
zfree(&he->branch_info);
}

if (he->mem_info) {
- map__zput(he->mem_info->iaddr.ms.map);
- map__zput(he->mem_info->daddr.ms.map);
+ map_symbol__exit(&he->mem_info->iaddr.ms);
+ map_symbol__exit(&he->mem_info->daddr.ms);
mem_info__zput(he->mem_info);
}

@@ -2700,10 +2698,8 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
*total_cycles += bi[i].flags.cycles;
}
for (unsigned int i = 0; i < bs->nr; i++) {
- map__put(bi[i].to.ms.map);
- maps__put(bi[i].to.ms.maps);
- map__put(bi[i].from.ms.map);
- maps__put(bi[i].from.ms.maps);
+ map_symbol__exit(&bi[i].to.ms);
+ map_symbol__exit(&bi[i].from.ms);
}
free(bi);
}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 31fdbb0f27af..90c750150b19 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2389,8 +2389,7 @@ static int add_callchain_ip(struct thread *thread,
iter_cycles, branch_from, srcline);
out:
addr_location__exit(&al);
- maps__put(ms.maps);
- map__put(ms.map);
+ map_symbol__exit(&ms);
return err;
}

@@ -3116,8 +3115,7 @@ static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms
if (ret != 0)
return ret;
}
- map__put(ilist_ms.map);
- maps__put(ilist_ms.maps);
+ map_symbol__exit(&ilist_ms);

return ret;
}
diff --git a/tools/perf/util/map_symbol.c b/tools/perf/util/map_symbol.c
new file mode 100644
index 000000000000..bef5079f2403
--- /dev/null
+++ b/tools/perf/util/map_symbol.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "map_symbol.h"
+#include "maps.h"
+#include "map.h"
+
+void map_symbol__exit(struct map_symbol *ms)
+{
+ maps__zput(ms->maps);
+ map__zput(ms->map);
+}
+
+void addr_map_symbol__exit(struct addr_map_symbol *ams)
+{
+ map_symbol__exit(&ams->ms);
+}
diff --git a/tools/perf/util/map_symbol.h b/tools/perf/util/map_symbol.h
index e08817b0c30f..72d5ed938ed6 100644
--- a/tools/perf/util/map_symbol.h
+++ b/tools/perf/util/map_symbol.h
@@ -22,4 +22,8 @@ struct addr_map_symbol {
u64 phys_addr;
u64 data_page_size;
};
+
+void map_symbol__exit(struct map_symbol *ms);
+void addr_map_symbol__exit(struct addr_map_symbol *ams);
+
#endif // __PERF_MAP_SYMBOL
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 822f4dcebfe6..82cc74b9358e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2791,8 +2791,11 @@ struct mem_info *mem_info__get(struct mem_info *mi)

void mem_info__put(struct mem_info *mi)
{
- if (mi && refcount_dec_and_test(&mi->refcnt))
+ if (mi && refcount_dec_and_test(&mi->refcnt)) {
+ addr_map_symbol__exit(&mi->iaddr);
+ addr_map_symbol__exit(&mi->daddr);
free(mi);
+ }
}

struct mem_info *mem_info__new(void)
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:18

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 18/50] tools lib api: Add io_dir an allocation free readdir alternative

glibc's opendir allocates a minimum of 32kb, when called recursively
for a directory tree the memory consumption can add up - nearly 300kb
during perf start-up when processing modules. Add a stack allocated
variant of readdir sized a little more than 1kb.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/lib/api/Makefile | 2 +-
tools/lib/api/io_dir.h | 72 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+), 1 deletion(-)
create mode 100644 tools/lib/api/io_dir.h

diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
index 044860ac1ed1..186aa407de8c 100644
--- a/tools/lib/api/Makefile
+++ b/tools/lib/api/Makefile
@@ -99,7 +99,7 @@ install_lib: $(LIBFILE)
$(call do_install_mkdir,$(libdir_SQ)); \
cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)

-HDRS := cpu.h debug.h io.h
+HDRS := cpu.h debug.h io.h io_dir.h
FD_HDRS := fd/array.h
FS_HDRS := fs/fs.h fs/tracing_path.h
INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
new file mode 100644
index 000000000000..8a70c0718df5
--- /dev/null
+++ b/tools/lib/api/io_dir.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/*
+ * Lightweight directory reading library.
+ */
+#ifndef __API_IO_DIR__
+#define __API_IO_DIR__
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/stat.h>
+
+struct io_dirent64 {
+ ino64_t d_ino; /* 64-bit inode number */
+ off64_t d_off; /* 64-bit offset to next structure */
+ unsigned short d_reclen; /* Size of this dirent */
+ unsigned char d_type; /* File type */
+ char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
+};
+
+struct io_dir {
+ int dirfd;
+ ssize_t available_bytes;
+ struct io_dirent64 *next;
+ struct io_dirent64 buff[4];
+};
+
+static inline void io_dir__init(struct io_dir *iod, int dirfd)
+{
+ iod->dirfd = dirfd;
+ iod->available_bytes = 0;
+}
+
+static inline void io_dir__rewinddir(struct io_dir *iod)
+{
+ lseek(iod->dirfd, 0, SEEK_SET);
+ iod->available_bytes = 0;
+}
+
+static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
+{
+ struct io_dirent64 *entry;
+
+ if (iod->available_bytes <= 0) {
+ ssize_t rc = getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
+
+ if (rc <= 0)
+ return NULL;
+ iod->available_bytes = rc;
+ iod->next = iod->buff;
+ }
+ entry = iod->next;
+ iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
+ iod->available_bytes -= entry->d_reclen;
+ return entry;
+}
+
+static inline bool io_dir__is_dir(const struct io_dir *iod, const struct io_dirent64 *dent)
+{
+ if (dent->d_type == DT_UNKNOWN) {
+ struct stat st;
+
+ if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
+ return false;
+
+ return S_ISDIR(st.st_mode);
+ }
+ return dent->d_type == DT_DIR;
+}
+
+#endif
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:29

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 25/50] perf map: Simplify map_ip/unmap_ip and make map size smaller

When mapping an IP it is either an identity mapping or a DSO relative
mapping, so a single bit is required in the struct to identify
this. The current code uses function pointers, adding 2 pointers per
map and also pushing the size of a map beyond 1 cache line. Switch to
using a byte to identify the mapping type (as well as priv and
erange_warned), to avoid any masking. Change struct maps's layout to
avoid holes.

Before:
```
struct map {
u64 start; /* 0 8 */
u64 end; /* 8 8 */
_Bool erange_warned:1; /* 16: 0 1 */
_Bool priv:1; /* 16: 1 1 */

/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

u32 prot; /* 20 4 */
u64 pgoff; /* 24 8 */
u64 reloc; /* 32 8 */
u64 (*map_ip)(const struct map *, u64); /* 40 8 */
u64 (*unmap_ip)(const struct map *, u64); /* 48 8 */
struct dso * dso; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
refcount_t refcnt; /* 64 4 */
u32 flags; /* 68 4 */

/* size: 72, cachelines: 2, members: 12 */
/* sum members: 68, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* last cacheline: 8 bytes */
};
```

After:
```
struct map {
u64 start; /* 0 8 */
u64 end; /* 8 8 */
u64 pgoff; /* 16 8 */
u64 reloc; /* 24 8 */
struct dso * dso; /* 32 8 */
refcount_t refcnt; /* 40 4 */
u32 prot; /* 44 4 */
u32 flags; /* 48 4 */
enum mapping_type mapping_type:8; /* 52: 0 4 */

/* Bitfield combined with next fields */

_Bool erange_warned; /* 53 1 */
_Bool priv; /* 54 1 */

/* size: 56, cachelines: 1, members: 11 */
/* padding: 1 */
/* last cacheline: 56 bytes */
};
```

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 3 +-
tools/perf/util/map.c | 20 +--------
tools/perf/util/map.h | 83 +++++++++++++++++++-----------------
tools/perf/util/symbol-elf.c | 6 +--
tools/perf/util/symbol.c | 6 +--
5 files changed, 50 insertions(+), 68 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index be3dab9d5253..b6831a1f909d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1360,8 +1360,7 @@ __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
if (machine->vmlinux_map == NULL)
return -ENOMEM;

- map__set_map_ip(machine->vmlinux_map, identity__map_ip);
- map__set_unmap_ip(machine->vmlinux_map, identity__map_ip);
+ map__set_mapping_type(machine->vmlinux_map, MAPPING_TYPE__IDENTITY);
return maps__insert(machine__kernel_maps(machine), machine->vmlinux_map);
}

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index f64b83004421..54c67cb7ecef 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -109,8 +109,7 @@ void map__init(struct map *map, u64 start, u64 end, u64 pgoff, struct dso *dso)
map__set_pgoff(map, pgoff);
map__set_reloc(map, 0);
map__set_dso(map, dso__get(dso));
- map__set_map_ip(map, map__dso_map_ip);
- map__set_unmap_ip(map, map__dso_unmap_ip);
+ map__set_mapping_type(map, MAPPING_TYPE__DSO);
map__set_erange_warned(map, false);
refcount_set(map__refcnt(map), 1);
}
@@ -172,7 +171,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
map__init(result, start, start + len, pgoff, dso);

if (anon || no_dso) {
- map->map_ip = map->unmap_ip = identity__map_ip;
+ map->mapping_type = MAPPING_TYPE__IDENTITY;

/*
* Set memory without DSO as loaded. All map__find_*
@@ -630,18 +629,3 @@ struct maps *map__kmaps(struct map *map)
}
return kmap->kmaps;
}
-
-u64 map__dso_map_ip(const struct map *map, u64 ip)
-{
- return ip - map__start(map) + map__pgoff(map);
-}
-
-u64 map__dso_unmap_ip(const struct map *map, u64 ip)
-{
- return ip + map__start(map) - map__pgoff(map);
-}
-
-u64 identity__map_ip(const struct map *map __maybe_unused, u64 ip)
-{
- return ip;
-}
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 1b53d53adc86..3a3b7757da5f 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -16,23 +16,25 @@ struct dso;
struct maps;
struct machine;

+enum mapping_type {
+ /* map__map_ip/map__unmap_ip are given as offsets in the DSO. */
+ MAPPING_TYPE__DSO,
+ /* map__map_ip/map__unmap_ip are just the given ip value. */
+ MAPPING_TYPE__IDENTITY,
+};
+
DECLARE_RC_STRUCT(map) {
u64 start;
u64 end;
- bool erange_warned:1;
- bool priv:1;
- u32 prot;
u64 pgoff;
u64 reloc;
-
- /* ip -> dso rip */
- u64 (*map_ip)(const struct map *, u64);
- /* dso rip -> ip */
- u64 (*unmap_ip)(const struct map *, u64);
-
struct dso *dso;
refcount_t refcnt;
+ u32 prot;
u32 flags;
+ enum mapping_type mapping_type:8;
+ bool erange_warned;
+ bool priv;
};

struct kmap;
@@ -41,38 +43,11 @@ struct kmap *__map__kmap(struct map *map);
struct kmap *map__kmap(struct map *map);
struct maps *map__kmaps(struct map *map);

-/* ip -> dso rip */
-u64 map__dso_map_ip(const struct map *map, u64 ip);
-/* dso rip -> ip */
-u64 map__dso_unmap_ip(const struct map *map, u64 ip);
-/* Returns ip */
-u64 identity__map_ip(const struct map *map __maybe_unused, u64 ip);
-
static inline struct dso *map__dso(const struct map *map)
{
return RC_CHK_ACCESS(map)->dso;
}

-static inline u64 map__map_ip(const struct map *map, u64 ip)
-{
- return RC_CHK_ACCESS(map)->map_ip(map, ip);
-}
-
-static inline u64 map__unmap_ip(const struct map *map, u64 ip)
-{
- return RC_CHK_ACCESS(map)->unmap_ip(map, ip);
-}
-
-static inline void *map__map_ip_ptr(struct map *map)
-{
- return RC_CHK_ACCESS(map)->map_ip;
-}
-
-static inline void* map__unmap_ip_ptr(struct map *map)
-{
- return RC_CHK_ACCESS(map)->unmap_ip;
-}
-
static inline u64 map__start(const struct map *map)
{
return RC_CHK_ACCESS(map)->start;
@@ -123,6 +98,34 @@ static inline size_t map__size(const struct map *map)
return map__end(map) - map__start(map);
}

+/* ip -> dso rip */
+static inline u64 map__dso_map_ip(const struct map *map, u64 ip)
+{
+ return ip - map__start(map) + map__pgoff(map);
+}
+
+/* dso rip -> ip */
+static inline u64 map__dso_unmap_ip(const struct map *map, u64 ip)
+{
+ return ip + map__start(map) - map__pgoff(map);
+}
+
+static inline u64 map__map_ip(const struct map *map, u64 ip)
+{
+ if ((RC_CHK_ACCESS(map)->mapping_type) == MAPPING_TYPE__DSO)
+ return map__dso_map_ip(map, ip);
+ else
+ return ip;
+}
+
+static inline u64 map__unmap_ip(const struct map *map, u64 ip)
+{
+ if ((RC_CHK_ACCESS(map)->mapping_type) == MAPPING_TYPE__DSO)
+ return map__dso_unmap_ip(map, ip);
+ else
+ return ip;
+}
+
/* rip/ip <-> addr suitable for passing to `objdump --start-address=` */
u64 map__rip_2objdump(struct map *map, u64 rip);

@@ -294,13 +297,13 @@ static inline void map__set_dso(struct map *map, struct dso *dso)
RC_CHK_ACCESS(map)->dso = dso;
}

-static inline void map__set_map_ip(struct map *map, u64 (*map_ip)(const struct map *map, u64 ip))
+static inline void map__set_mapping_type(struct map *map, enum mapping_type type)
{
- RC_CHK_ACCESS(map)->map_ip = map_ip;
+ RC_CHK_ACCESS(map)->mapping_type = type;
}

-static inline void map__set_unmap_ip(struct map *map, u64 (*unmap_ip)(const struct map *map, u64 rip))
+static inline enum mapping_type map__mapping_type(struct map *map)
{
- RC_CHK_ACCESS(map)->unmap_ip = unmap_ip;
+ return RC_CHK_ACCESS(map)->mapping_type;
}
#endif /* __PERF_MAP_H */
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 9e7eeaf616b8..4b934ed3bfd1 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1392,8 +1392,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
map__set_start(map, shdr->sh_addr + ref_reloc(kmap));
map__set_end(map, map__start(map) + shdr->sh_size);
map__set_pgoff(map, shdr->sh_offset);
- map__set_map_ip(map, map__dso_map_ip);
- map__set_unmap_ip(map, map__dso_unmap_ip);
+ map__set_mapping_type(map, MAPPING_TYPE__DSO);
/* Ensure maps are correctly ordered */
if (kmaps) {
int err;
@@ -1455,8 +1454,7 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
map__set_end(curr_map, map__start(curr_map) + shdr->sh_size);
map__set_pgoff(curr_map, shdr->sh_offset);
} else {
- map__set_map_ip(curr_map, identity__map_ip);
- map__set_unmap_ip(curr_map, identity__map_ip);
+ map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
}
curr_dso->symtab_type = dso->symtab_type;
if (maps__insert(kmaps, curr_map))
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 82cc74b9358e..314c0263bf3c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -956,8 +956,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
return -1;
}

- map__set_map_ip(curr_map, identity__map_ip);
- map__set_unmap_ip(curr_map, identity__map_ip);
+ map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
if (maps__insert(kmaps, curr_map)) {
dso__put(ndso);
return -1;
@@ -1475,8 +1474,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
map__set_start(map, map__start(new_map));
map__set_end(map, map__end(new_map));
map__set_pgoff(map, map__pgoff(new_map));
- map__set_map_ip(map, map__map_ip_ptr(new_map));
- map__set_unmap_ip(map, map__unmap_ip_ptr(new_map));
+ map__set_mapping_type(map, map__mapping_type(new_map));
/* Ensure maps are correctly ordered */
map_ref = map__get(map);
maps__remove(kmaps, map_ref);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:42

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 16/50] tools api fs: Switch filename__read_str to use io.h

filename__read_str has its own string reading code that allocates
memory before reading into it. The memory allocated is sized at BUFSIZ
that is 8kb. Most strings are short and so most of this 8kb is
wasted.

Refactor io__getline so that the newline character can be configurable
and ignored in the case of filename__read_str.

Code like build_caches_for_cpu in perf's header.c will read many
strings and hold them in a data structure, in this case multiple
strings per cache level per CPU. Using io.h's io__getline avoids the
wasted memory as strings are temporarily read into a buffer on the
stack before being copied to a buffer that grows 128 bytes at a time
and is never sized larger than the string.

For a 16 hyperthread system the memory consumption of "perf record
true" is reduced by 180kb, primarily through saving memory when
reading the cache information.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/lib/api/fs/fs.c | 56 +++++++++++--------------------------------
tools/lib/api/io.h | 9 +++++--
2 files changed, 21 insertions(+), 44 deletions(-)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 5cb0eeec2c8a..496812b5f1d2 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -16,6 +16,7 @@
#include <sys/mount.h>

#include "fs.h"
+#include "../io.h"
#include "debug-internal.h"

#define _STR(x) #x
@@ -344,53 +345,24 @@ int filename__read_ull(const char *filename, unsigned long long *value)
return filename__read_ull_base(filename, value, 0);
}

-#define STRERR_BUFSIZE 128 /* For the buffer size of strerror_r */
-
int filename__read_str(const char *filename, char **buf, size_t *sizep)
{
- size_t size = 0, alloc_size = 0;
- void *bf = NULL, *nbf;
- int fd, n, err = 0;
- char sbuf[STRERR_BUFSIZE];
+ struct io io;
+ char bf[128];
+ int err;

- fd = open(filename, O_RDONLY);
- if (fd < 0)
+ io.fd = open(filename, O_RDONLY);
+ if (io.fd < 0)
return -errno;
-
- do {
- if (size == alloc_size) {
- alloc_size += BUFSIZ;
- nbf = realloc(bf, alloc_size);
- if (!nbf) {
- err = -ENOMEM;
- break;
- }
-
- bf = nbf;
- }
-
- n = read(fd, bf + size, alloc_size - size);
- if (n < 0) {
- if (size) {
- pr_warn("read failed %d: %s\n", errno,
- strerror_r(errno, sbuf, sizeof(sbuf)));
- err = 0;
- } else
- err = -errno;
-
- break;
- }
-
- size += n;
- } while (n > 0);
-
- if (!err) {
- *sizep = size;
- *buf = bf;
+ io__init(&io, io.fd, bf, sizeof(bf));
+ *buf = NULL;
+ err = io__getline_nl(&io, buf, sizep, /*nl=*/-1);
+ if (err < 0) {
+ free(*buf);
+ *buf = NULL;
} else
- free(bf);
-
- close(fd);
+ err = 0;
+ close(io.fd);
return err;
}

diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
index a77b74c5fb65..50d33e14fb56 100644
--- a/tools/lib/api/io.h
+++ b/tools/lib/api/io.h
@@ -141,7 +141,7 @@ static inline int io__get_dec(struct io *io, __u64 *dec)
}

/* Read up to and including the first newline following the pattern of getline. */
-static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
+static inline ssize_t io__getline_nl(struct io *io, char **line_out, size_t *line_len_out, int nl)
{
char buf[128];
int buf_pos = 0;
@@ -151,7 +151,7 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l

/* TODO: reuse previously allocated memory. */
free(*line_out);
- while (ch != '\n') {
+ while (ch != nl) {
ch = io__get_char(io);

if (ch < 0)
@@ -184,4 +184,9 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
return -ENOMEM;
}

+static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_len_out)
+{
+ return io__getline_nl(io, line_out, line_len_out, /*nl=*/'\n');
+}
+
#endif /* __API_IO__ */
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:44

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 24/50] perf events: Remove scandir in thread synthesis

This avoids scanddir reading the directory into memory that's
allocated and instead allocates on the stack.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/synthetic-events.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index a0579c7d7b9e..7cc38f2a0e9e 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -38,6 +38,7 @@
#include <uapi/linux/mman.h> /* To get things like MAP_HUGETLB even on older libc headers */
#include <api/fs/fs.h>
#include <api/io.h>
+#include <api/io_dir.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
@@ -751,10 +752,10 @@ static int __event__synthesize_thread(union perf_event *comm_event,
bool needs_mmap, bool mmap_data)
{
char filename[PATH_MAX];
- struct dirent **dirent;
+ struct io_dir iod;
+ struct io_dirent64 *dent;
pid_t tgid, ppid;
int rc = 0;
- int i, n;

/* special case: only send one comm event using passed in pid */
if (!full) {
@@ -786,16 +787,19 @@ static int __event__synthesize_thread(union perf_event *comm_event,
snprintf(filename, sizeof(filename), "%s/proc/%d/task",
machine->root_dir, pid);

- n = scandir(filename, &dirent, filter_task, NULL);
- if (n < 0)
- return n;
+ io_dir__init(&iod, open(filename, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (iod.dirfd < 0)
+ return -1;

- for (i = 0; i < n; i++) {
+ while ((dent = io_dir__readdir(&iod)) != NULL) {
char *end;
pid_t _pid;
bool kernel_thread = false;

- _pid = strtol(dirent[i]->d_name, &end, 10);
+ if (!isdigit(dent->d_name[0]))
+ continue;
+
+ _pid = strtol(dent->d_name, &end, 10);
if (*end)
continue;

@@ -829,9 +833,7 @@ static int __event__synthesize_thread(union perf_event *comm_event,
}
}

- for (i = 0; i < n; i++)
- zfree(&dirent[i]);
- free(dirent);
+ close(iod.dirfd);

return rc;
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:51

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 31/50] perf maps: Refactor maps__fixup_overlappings

Rename to maps__fixup_overlap_and_insert as the given mapping is
always inserted. Factor out first_ending_after as a utility
function. Minor variable name changes. Switch to using debug_file()
rather than passing a debug FILE*.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 62 ++++++++++++++++++++++++----------------
tools/perf/util/maps.h | 2 +-
tools/perf/util/thread.c | 3 +-
3 files changed, 39 insertions(+), 28 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index f13fd3a9686b..40df08dd9bf3 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -334,20 +334,16 @@ size_t maps__fprintf(struct maps *maps, FILE *fp)
return args.printed;
}

-int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
+/*
+ * Find first map where end > new->start.
+ * Same as find_vma() in kernel.
+ */
+static struct rb_node *first_ending_after(struct maps *maps, const struct map *map)
{
struct rb_root *root;
struct rb_node *next, *first;
- int err = 0;
-
- down_write(maps__lock(maps));

root = maps__entries(maps);
-
- /*
- * Find first map where end > map->start.
- * Same as find_vma() in kernel.
- */
next = root->rb_node;
first = NULL;
while (next) {
@@ -361,8 +357,22 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
} else
next = next->rb_right;
}
+ return first;
+}
+
+/*
+ * Adds new to maps, if new overlaps existing entries then the existing maps are
+ * adjusted or removed so that new fits without overlapping any entries.
+ */
+int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
+{
+
+ struct rb_node *next;
+ int err = 0;
+
+ down_write(maps__lock(maps));

- next = first;
+ next = first_ending_after(maps, new);
while (next && !err) {
struct map_rb_node *pos = rb_entry(next, struct map_rb_node, rb_node);
next = rb_next(&pos->rb_node);
@@ -371,27 +381,27 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
* Stop if current map starts after map->end.
* Maps are ordered by start: next will not overlap for sure.
*/
- if (map__start(pos->map) >= map__end(map))
+ if (map__start(pos->map) >= map__end(new))
break;

if (verbose >= 2) {

if (use_browser) {
pr_debug("overlapping maps in %s (disable tui for more info)\n",
- map__dso(map)->name);
+ map__dso(new)->name);
} else {
- fputs("overlapping maps:\n", fp);
- map__fprintf(map, fp);
- map__fprintf(pos->map, fp);
+ pr_debug("overlapping maps:\n");
+ map__fprintf(new, debug_file());
+ map__fprintf(pos->map, debug_file());
}
}

- rb_erase_init(&pos->rb_node, root);
+ rb_erase_init(&pos->rb_node, maps__entries(maps));
/*
* Now check if we need to create new maps for areas not
* overlapped by the new map:
*/
- if (map__start(map) > map__start(pos->map)) {
+ if (map__start(new) > map__start(pos->map)) {
struct map *before = map__clone(pos->map);

if (before == NULL) {
@@ -399,7 +409,7 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
goto put_map;
}

- map__set_end(before, map__start(map));
+ map__set_end(before, map__start(new));
err = __maps__insert(maps, before);
if (err) {
map__put(before);
@@ -407,11 +417,11 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
}

if (verbose >= 2 && !use_browser)
- map__fprintf(before, fp);
+ map__fprintf(before, debug_file());
map__put(before);
}

- if (map__end(map) < map__end(pos->map)) {
+ if (map__end(new) < map__end(pos->map)) {
struct map *after = map__clone(pos->map);

if (after == NULL) {
@@ -419,23 +429,25 @@ int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
goto put_map;
}

- map__set_start(after, map__end(map));
- map__add_pgoff(after, map__end(map) - map__start(pos->map));
- assert(map__map_ip(pos->map, map__end(map)) ==
- map__map_ip(after, map__end(map)));
+ map__set_start(after, map__end(new));
+ map__add_pgoff(after, map__end(new) - map__start(pos->map));
+ assert(map__map_ip(pos->map, map__end(new)) ==
+ map__map_ip(after, map__end(new)));
err = __maps__insert(maps, after);
if (err) {
map__put(after);
goto put_map;
}
if (verbose >= 2 && !use_browser)
- map__fprintf(after, fp);
+ map__fprintf(after, debug_file());
map__put(after);
}
put_map:
map__put(pos->map);
free(pos);
}
+ /* Add the map. */
+ err = __maps__insert(maps, new);
up_write(maps__lock(maps));
return err;
}
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index b94ad5c8fea7..62e94d443c02 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -133,7 +133,7 @@ struct addr_map_symbol;

int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams);

-int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp);
+int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new);

struct map *maps__find_by_name(struct maps *maps, const char *name);

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 2b938ccbfe3e..cab818af6787 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -345,8 +345,7 @@ int thread__insert_map(struct thread *thread, struct map *map)
if (ret)
return ret;

- maps__fixup_overlappings(thread__maps(thread), map, stderr);
- return maps__insert(thread__maps(thread), map);
+ return maps__fixup_overlap_and_insert(thread__maps(thread), map);
}

struct thread__prepare_access_maps_cb_args {
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:26:52

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 27/50] perf thread: Add missing RC_CHK_ACCESS

Comparing pointers without RC_CHK_ACCESS means the indirect object
will be compared rather than the underlying maps when REFCNT_CHECKING
is enabled. Fix by adding missing RC_CHK_ACCESS.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/thread.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index fe5e6991ae4b..eae3041408a7 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -385,7 +385,7 @@ static int thread__clone_maps(struct thread *thread, struct thread *parent, bool
if (thread__pid(thread) == thread__pid(parent))
return thread__prepare_access(thread);

- if (thread__maps(thread) == thread__maps(parent)) {
+ if (RC_CHK_ACCESS(thread__maps(thread)) == RC_CHK_ACCESS(thread__maps(parent))) {
pr_debug("broken map groups on thread %d/%d parent %d/%d\n",
thread__pid(thread), thread__tid(thread),
thread__pid(parent), thread__tid(parent));
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:00

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 36/50] perf maps: Reduce scope of map_rb_node and maps internals

Avoid exposing the implementation of maps so that the internals can be
refactored.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 90 ++++++++++++++++++++++++++----------------
tools/perf/util/maps.h | 23 -----------
2 files changed, 55 insertions(+), 58 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 38d56709bd5e..01c15d0b300a 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -10,6 +10,11 @@
#include "ui/ui.h"
#include "unwind.h"

+struct map_rb_node {
+ struct rb_node rb_node;
+ struct map *map;
+};
+
#define maps__for_each_entry(maps, map) \
for (map = maps__first(maps); map; map = map_rb_node__next(map))

@@ -17,6 +22,56 @@
for (map = maps__first(maps), next = map_rb_node__next(map); map; \
map = next, next = map_rb_node__next(map))

+static struct rb_root *maps__entries(struct maps *maps)
+{
+ return &RC_CHK_ACCESS(maps)->entries;
+}
+
+static struct rw_semaphore *maps__lock(struct maps *maps)
+{
+ return &RC_CHK_ACCESS(maps)->lock;
+}
+
+static struct map **maps__maps_by_name(struct maps *maps)
+{
+ return RC_CHK_ACCESS(maps)->maps_by_name;
+}
+
+static struct map_rb_node *maps__first(struct maps *maps)
+{
+ struct rb_node *first = rb_first(maps__entries(maps));
+
+ if (first)
+ return rb_entry(first, struct map_rb_node, rb_node);
+ return NULL;
+}
+
+static struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
+{
+ struct rb_node *next;
+
+ if (!node)
+ return NULL;
+
+ next = rb_next(&node->rb_node);
+
+ if (!next)
+ return NULL;
+
+ return rb_entry(next, struct map_rb_node, rb_node);
+}
+
+static struct map_rb_node *maps__find_node(struct maps *maps, struct map *map)
+{
+ struct map_rb_node *rb_node;
+
+ maps__for_each_entry(maps, rb_node) {
+ if (rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map))
+ return rb_node;
+ }
+ return NULL;
+}
+
static void maps__init(struct maps *maps, struct machine *machine)
{
refcount_set(maps__refcnt(maps), 1);
@@ -484,17 +539,6 @@ int maps__copy_from(struct maps *maps, struct maps *parent)
return err;
}

-struct map_rb_node *maps__find_node(struct maps *maps, struct map *map)
-{
- struct map_rb_node *rb_node;
-
- maps__for_each_entry(maps, rb_node) {
- if (rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map))
- return rb_node;
- }
- return NULL;
-}
-
struct map *maps__find(struct maps *maps, u64 ip)
{
struct rb_node *p;
@@ -520,30 +564,6 @@ struct map *maps__find(struct maps *maps, u64 ip)
return m ? m->map : NULL;
}

-struct map_rb_node *maps__first(struct maps *maps)
-{
- struct rb_node *first = rb_first(maps__entries(maps));
-
- if (first)
- return rb_entry(first, struct map_rb_node, rb_node);
- return NULL;
-}
-
-struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
-{
- struct rb_node *next;
-
- if (!node)
- return NULL;
-
- next = rb_next(&node->rb_node);
-
- if (!next)
- return NULL;
-
- return rb_entry(next, struct map_rb_node, rb_node);
-}
-
static int map__strcmp(const void *a, const void *b)
{
const struct map *map_a = *(const struct map **)a;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 84b42c8456e8..d836d04c9402 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -15,11 +15,6 @@ struct machine;
struct map;
struct maps;

-struct map_rb_node {
- struct rb_node rb_node;
- struct map *map;
-};
-
struct map_list_node {
struct list_head node;
struct map *map;
@@ -30,9 +25,6 @@ static inline struct map_list_node *map_list_node__new(void)
return malloc(sizeof(struct map_list_node));
}

-struct map_rb_node *maps__first(struct maps *maps);
-struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
-struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
struct map *maps__find(struct maps *maps, u64 addr);

DECLARE_RC_STRUCT(maps) {
@@ -78,26 +70,11 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
/* Iterate over map removing an entry if cb returns true. */
void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);

-static inline struct rb_root *maps__entries(struct maps *maps)
-{
- return &RC_CHK_ACCESS(maps)->entries;
-}
-
static inline struct machine *maps__machine(struct maps *maps)
{
return RC_CHK_ACCESS(maps)->machine;
}

-static inline struct rw_semaphore *maps__lock(struct maps *maps)
-{
- return &RC_CHK_ACCESS(maps)->lock;
-}
-
-static inline struct map **maps__maps_by_name(struct maps *maps)
-{
- return RC_CHK_ACCESS(maps)->maps_by_name;
-}
-
static inline unsigned int maps__nr_maps(const struct maps *maps)
{
return RC_CHK_ACCESS(maps)->nr_maps;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:13

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 29/50] perf maps: Add remove maps function to remove a map based on callback

Removing maps wasn't being done under the write lock. Similar to
maps__for_each_map, iterate the entries but in this case remove the
entry based on the result of the callback. If an entry was removed
then maps_by_name also needs updating, so add missed flush.

In dso__load_kcore, the test of map to save would always be false with
REFCNT_CHECKING because of a missing RC_CHK_ACCESS.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 24 ++++++++++++++++++++++++
tools/perf/util/maps.h | 6 ++----
tools/perf/util/symbol.c | 24 ++++++++++++------------
3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 00e6589bba10..f13fd3a9686b 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -13,6 +13,10 @@
#define maps__for_each_entry(maps, map) \
for (map = maps__first(maps); map; map = map_rb_node__next(map))

+#define maps__for_each_entry_safe(maps, map, next) \
+ for (map = maps__first(maps), next = map_rb_node__next(map); map; \
+ map = next, next = map_rb_node__next(map))
+
static void maps__init(struct maps *maps, struct machine *machine)
{
refcount_set(maps__refcnt(maps), 1);
@@ -214,6 +218,26 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
return ret;
}

+void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data)
+{
+ struct map_rb_node *pos, *next;
+ unsigned int start_nr_maps;
+
+ down_write(maps__lock(maps));
+
+ start_nr_maps = maps__nr_maps(maps);
+ maps__for_each_entry_safe(maps, pos, next) {
+ if (cb(pos->map, data)) {
+ __maps__remove(maps, pos);
+ --RC_CHK_ACCESS(maps)->nr_maps;
+ }
+ }
+ if (maps__maps_by_name(maps) && start_nr_maps != maps__nr_maps(maps))
+ __maps__free_maps_by_name(maps);
+
+ up_write(maps__lock(maps));
+}
+
struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
{
struct map *map = maps__find(maps, addr);
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 8ac30cdaf5bd..b94ad5c8fea7 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -36,10 +36,6 @@ struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
struct map *maps__find(struct maps *maps, u64 addr);

-#define maps__for_each_entry_safe(maps, map, next) \
- for (map = maps__first(maps), next = map_rb_node__next(map); map; \
- map = next, next = map_rb_node__next(map))
-
DECLARE_RC_STRUCT(maps) {
struct rb_root entries;
struct rw_semaphore lock;
@@ -80,6 +76,8 @@ static inline void __maps__zput(struct maps **map)

/* Iterate over map calling cb for each entry. */
int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
+/* Iterate over map removing an entry if cb returns true. */
+void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);

static inline struct rb_root *maps__entries(struct maps *maps)
{
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 72f03b875478..30da8a405d11 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1239,13 +1239,23 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
return 0;
}

+static bool remove_old_maps(struct map *map, void *data)
+{
+ const struct map *map_to_save = data;
+
+ /*
+ * We need to preserve eBPF maps even if they are covered by kcore,
+ * because we need to access eBPF dso for source data.
+ */
+ return RC_CHK_ACCESS(map) != RC_CHK_ACCESS(map_to_save) && !__map__is_bpf_prog(map);
+}
+
static int dso__load_kcore(struct dso *dso, struct map *map,
const char *kallsyms_filename)
{
struct maps *kmaps = map__kmaps(map);
struct kcore_mapfn_data md;
struct map *replacement_map = NULL;
- struct map_rb_node *old_node, *next;
struct machine *machine;
bool is_64_bit;
int err, fd;
@@ -1292,17 +1302,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
}

/* Remove old maps */
- maps__for_each_entry_safe(kmaps, old_node, next) {
- struct map *old_map = old_node->map;
-
- /*
- * We need to preserve eBPF maps even if they are
- * covered by kcore, because we need to access
- * eBPF dso for source data.
- */
- if (old_map != map && !__map__is_bpf_prog(old_map))
- maps__remove(kmaps, old_map);
- }
+ maps__remove_maps(kmaps, remove_old_maps, map);
machine->trampolines_mapped = false;

/* Find the kernel map using the '_stext' symbol */
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:39

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 43/50] perf maps: Locking tidy up of nr_maps

After this change maps__nr_maps is only used by tests, existing users
are migrated to maps__empty. Compute maps__empty under the read lock.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 2 +-
tools/perf/util/maps.c | 10 ++++++++--
tools/perf/util/maps.h | 4 ++--
3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 42d73f00f9c1..f9c77119af22 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -441,7 +441,7 @@ static struct thread *findnew_guest_code(struct machine *machine,
return NULL;

/* Assume maps are set up if there are any */
- if (maps__nr_maps(thread__maps(thread)))
+ if (!maps__empty(thread__maps(thread)))
return thread;

host_thread = machine__find_thread(host_machine, -1, pid);
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 41e9e39b1b4c..725f5d73e93a 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -528,7 +528,13 @@ void maps__remove(struct maps *maps, struct map *map)

bool maps__empty(struct maps *maps)
{
- return maps__nr_maps(maps) == 0;
+ bool res;
+
+ down_read(maps__lock(maps));
+ res = maps__nr_maps(maps) == 0;
+ up_read(maps__lock(maps));
+
+ return res;
}

bool maps__equal(struct maps *a, struct maps *b)
@@ -851,7 +857,7 @@ int maps__copy_from(struct maps *dest, struct maps *parent)

parent_maps_by_address = maps__maps_by_address(parent);
n = maps__nr_maps(parent);
- if (maps__empty(dest)) {
+ if (maps__nr_maps(dest) == 0) {
/* No existing mappings so just copy from parent to avoid reallocs in insert. */
unsigned int nr_maps_allocated = RC_CHK_ACCESS(parent)->nr_maps_allocated;
struct map **dest_maps_by_address =
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 4bcba136ffe5..d9aa62ed968a 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -43,8 +43,8 @@ int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data)
void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);

struct machine *maps__machine(const struct maps *maps);
-unsigned int maps__nr_maps(const struct maps *maps);
-refcount_t *maps__refcnt(struct maps *maps);
+unsigned int maps__nr_maps(const struct maps *maps); /* Test only. */
+refcount_t *maps__refcnt(struct maps *maps); /* Test only. */

#ifdef HAVE_LIBUNWIND_SUPPORT
void *maps__addr_space(const struct maps *maps);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:39

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 15/50] perf machine thread: Remove exited threads by default

struct thread values hold onto references to mmaps, dsos, etc. When a
thread exits it is necessary to clean all of this memory up by
removing the thread from the machine's threads. Some tools require
this doesn't happen, such as auxtrace events, perf report if offcpu
events exist or if a task list is being generated, so add a
symbol_conf value to make the behavior optional. When an exited thread
is left in the machine's threads, mark it as exited.

This change relates to commit 40826c45eb0b ("perf thread: Remove
notion of dead threads"). Dead threads were removed as they had a
reference count of 0 and were difficult to reason about with the
reference count checker. Here a thread is removed from threads when it
exits, unless via symbol_conf the exited thread isn't remove and is
marked as exited. Reference counting behaves as it normally does.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-report.c | 7 +++++++
tools/perf/util/machine.c | 10 +++++++---
tools/perf/util/session.c | 5 +++++
tools/perf/util/symbol_conf.h | 3 ++-
tools/perf/util/thread.h | 14 ++++++++++++++
5 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dcedfe00f04d..749246817aed 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1411,6 +1411,13 @@ int cmd_report(int argc, const char **argv)
if (ret < 0)
goto exit;

+ /*
+ * tasks_mode require access to exited threads to list those that are in
+ * the data file. Off-cpu events are synthesized after other events and
+ * reference exited threads.
+ */
+ symbol_conf.keep_exited_threads = true;
+
annotation_options__init(&report.annotation_opts);

ret = perf_config(report__config, &report);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 90c750150b19..a985d004aa8d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2157,9 +2157,13 @@ int machine__process_exit_event(struct machine *machine, union perf_event *event
if (dump_trace)
perf_event__fprintf_task(event, stdout);

- if (thread != NULL)
- thread__put(thread);
-
+ if (thread != NULL) {
+ if (symbol_conf.keep_exited_threads)
+ thread__set_exited(thread, /*exited=*/true);
+ else
+ machine__remove_thread(machine, thread);
+ }
+ thread__put(thread);
return 0;
}

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1e9aa8ed15b6..c6afba7ab1a5 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -115,6 +115,11 @@ static int perf_session__open(struct perf_session *session, int repipe_fd)
return -1;
}

+ if (perf_header__has_feat(&session->header, HEADER_AUXTRACE)) {
+ /* Auxiliary events may reference exited threads, hold onto dead ones. */
+ symbol_conf.keep_exited_threads = true;
+ }
+
if (perf_data__is_pipe(data))
return 0;

diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
index 2b2fb9e224b0..6040286e07a6 100644
--- a/tools/perf/util/symbol_conf.h
+++ b/tools/perf/util/symbol_conf.h
@@ -43,7 +43,8 @@ struct symbol_conf {
disable_add2line_warn,
buildid_mmap2,
guest_code,
- lazy_load_kernel_maps;
+ lazy_load_kernel_maps,
+ keep_exited_threads;
const char *vmlinux_name,
*kallsyms_name,
*source_prefix,
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e79225a0ea46..0df775b5c110 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -36,13 +36,22 @@ struct thread_rb_node {
};

DECLARE_RC_STRUCT(thread) {
+ /** @maps: mmaps associated with this thread. */
struct maps *maps;
pid_t pid_; /* Not all tools update this */
+ /** @tid: thread ID number unique to a machine. */
pid_t tid;
+ /** @ppid: parent process of the process this thread belongs to. */
pid_t ppid;
int cpu;
int guest_cpu; /* For QEMU thread */
refcount_t refcnt;
+ /**
+ * @exited: Has the thread had an exit event. Such threads are usually
+ * removed from the machine's threads but some events/tools require
+ * access to dead threads.
+ */
+ bool exited;
bool comm_set;
int comm_len;
struct list_head namespaces_list;
@@ -189,6 +198,11 @@ static inline refcount_t *thread__refcnt(struct thread *thread)
return &RC_CHK_ACCESS(thread)->refcnt;
}

+static inline void thread__set_exited(struct thread *thread, bool exited)
+{
+ RC_CHK_ACCESS(thread)->exited = exited;
+}
+
static inline bool thread__comm_set(const struct thread *thread)
{
return RC_CHK_ACCESS(thread)->comm_set;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:41

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 32/50] perf maps: Do simple merge if given map doesn't overlap

Simplify merge in for the simple case of a non-overlapping map.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 40df08dd9bf3..14e1a169433d 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -696,9 +696,20 @@ void maps__fixup_end(struct maps *maps)
int maps__merge_in(struct maps *kmaps, struct map *new_map)
{
struct map_rb_node *rb_node;
+ struct rb_node *first;
+ bool overlaps;
LIST_HEAD(merged);
int err = 0;

+ down_read(maps__lock(kmaps));
+ first = first_ending_after(kmaps, new_map);
+ overlaps = first &&
+ map__start(rb_entry(first, struct map_rb_node, rb_node)->map) < map__end(new_map);
+ up_read(maps__lock(kmaps));
+
+ if (!overlaps)
+ return maps__insert(kmaps, new_map);
+
maps__for_each_entry(kmaps, rb_node) {
struct map *old_map = rb_node->map;

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:47

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 39/50] perf maps: Get map before returning in maps__find

Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/tests/dwarf-unwind.c | 1 +
tools/perf/tests/vmlinux-kallsyms.c | 5 ++---
tools/perf/util/bpf-event.c | 1 +
tools/perf/util/event.c | 4 ++--
tools/perf/util/machine.c | 22 ++++++++--------------
tools/perf/util/maps.c | 17 ++++++++++-------
tools/perf/util/symbol.c | 3 ++-
7 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/tools/perf/arch/x86/tests/dwarf-unwind.c b/tools/perf/arch/x86/tests/dwarf-unwind.c
index 5bfec3345d59..c05c0a85dad4 100644
--- a/tools/perf/arch/x86/tests/dwarf-unwind.c
+++ b/tools/perf/arch/x86/tests/dwarf-unwind.c
@@ -34,6 +34,7 @@ static int sample_ustack(struct perf_sample *sample,
}

stack_size = map__end(map) - sp;
+ map__put(map);
stack_size = stack_size > STACK_SIZE ? STACK_SIZE : stack_size;

memcpy(buf, (void *) sp, stack_size);
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 822f893e67d5..e808e6fc8f76 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -151,10 +151,8 @@ static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
u64 mem_end = map__unmap_ip(args->vmlinux_map, map__end(map));

pair = maps__find(args->kallsyms.kmaps, mem_start);
- if (pair == NULL || map__priv(pair))
- return 0;

- if (map__start(pair) == mem_start) {
+ if (pair != NULL && !map__priv(pair) && map__start(pair) == mem_start) {
struct dso *dso = map__dso(map);

if (!args->header_printed) {
@@ -170,6 +168,7 @@ static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
pr_info(" %s\n", dso->name);
map__set_priv(pair, 1);
}
+ map__put(pair);
return 0;
}

diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 830711cae30d..d07fd5ffa823 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -63,6 +63,7 @@ static int machine__process_bpf_event_load(struct machine *machine,
dso->bpf_prog.id = id;
dso->bpf_prog.sub_id = i;
dso->bpf_prog.env = env;
+ map__put(map);
}
}
return 0;
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 68f45e9e63b6..198903157f9e 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -511,7 +511,7 @@ size_t perf_event__fprintf_text_poke(union perf_event *event, struct machine *ma
struct addr_location al;

addr_location__init(&al);
- al.map = map__get(maps__find(machine__kernel_maps(machine), tp->addr));
+ al.map = maps__find(machine__kernel_maps(machine), tp->addr);
if (al.map && map__load(al.map) >= 0) {
al.addr = map__map_ip(al.map, tp->addr);
al.sym = map__find_symbol(al.map, al.addr);
@@ -641,7 +641,7 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
return NULL;
}
al->maps = maps__get(maps);
- al->map = map__get(maps__find(maps, al->addr));
+ al->map = maps__find(maps, al->addr);
if (al->map != NULL) {
/*
* Kernel maps might be changed when loading symbols so loading
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ab345604f274..1112a9dbb21a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -897,7 +897,6 @@ static int machine__process_ksymbol_register(struct machine *machine,
struct symbol *sym;
struct dso *dso;
struct map *map = maps__find(machine__kernel_maps(machine), event->ksymbol.addr);
- bool put_map = false;
int err = 0;

if (!map) {
@@ -914,12 +913,6 @@ static int machine__process_ksymbol_register(struct machine *machine,
err = -ENOMEM;
goto out;
}
- /*
- * The inserted map has a get on it, we need to put to release
- * the reference count here, but do it after all accesses are
- * done.
- */
- put_map = true;
if (event->ksymbol.ksym_type == PERF_RECORD_KSYMBOL_TYPE_OOL) {
dso->binary_type = DSO_BINARY_TYPE__OOL;
dso->data.file_size = event->ksymbol.len;
@@ -953,8 +946,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
}
dso__insert_symbol(dso, sym);
out:
- if (put_map)
- map__put(map);
+ map__put(map);
return err;
}

@@ -978,7 +970,7 @@ static int machine__process_ksymbol_unregister(struct machine *machine,
if (sym)
dso__delete_symbol(dso, sym);
}
-
+ map__put(map);
return 0;
}

@@ -1006,11 +998,11 @@ int machine__process_text_poke(struct machine *machine, union perf_event *event,
perf_event__fprintf_text_poke(event, machine, stdout);

if (!event->text_poke.new_len)
- return 0;
+ goto out;

if (cpumode != PERF_RECORD_MISC_KERNEL) {
pr_debug("%s: unsupported cpumode - ignoring\n", __func__);
- return 0;
+ goto out;
}

if (dso) {
@@ -1033,7 +1025,8 @@ int machine__process_text_poke(struct machine *machine, union perf_event *event,
pr_debug("Failed to find kernel text poke address map for %#" PRI_lx64 "\n",
event->text_poke.addr);
}
-
+out:
+ map__put(map);
return 0;
}

@@ -1301,9 +1294,10 @@ static int machine__map_x86_64_entry_trampolines_cb(struct map *map, void *data)
return 0;

dest_map = maps__find(args->kmaps, map__pgoff(map));
- if (dest_map != map)
+ if (RC_CHK_ACCESS(dest_map) != RC_CHK_ACCESS(map))
map__set_pgoff(map, map__map_ip(dest_map, map__pgoff(map)));

+ map__put(dest_map);
args->found = true;
return 0;
}
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 06fdd8a7c2a2..28facfdac1d7 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -487,15 +487,18 @@ void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data
struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
{
struct map *map = maps__find(maps, addr);
+ struct symbol *result = NULL;

/* Ensure map is loaded before using map->map_ip */
if (map != NULL && map__load(map) >= 0) {
- if (mapp != NULL)
- *mapp = map; // TODO: map_put on else path when find returns a get.
- return map__find_symbol(map, map__map_ip(map, addr));
- }
+ if (mapp)
+ *mapp = map;

- return NULL;
+ result = map__find_symbol(map, map__map_ip(map, addr));
+ if (!mapp)
+ map__put(map);
+ }
+ return result;
}

struct maps__find_symbol_by_name_args {
@@ -539,7 +542,7 @@ int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
if (ams->addr < map__start(ams->ms.map) || ams->addr >= map__end(ams->ms.map)) {
if (maps == NULL)
return -1;
- ams->ms.map = maps__find(maps, ams->addr); // TODO: map_get
+ ams->ms.map = maps__find(maps, ams->addr);
if (ams->ms.map == NULL)
return -1;
}
@@ -848,7 +851,7 @@ struct map *maps__find(struct maps *maps, u64 ip)
sizeof(*mapp), map__addr_cmp);

if (mapp)
- result = *mapp; // map__get(*mapp);
+ result = map__get(*mapp);
done = true;
}
up_read(maps__lock(maps));
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 30da8a405d11..ad4819a24320 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -757,7 +757,6 @@ static int dso__load_all_kallsyms(struct dso *dso, const char *filename)

static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
{
- struct map *curr_map;
struct symbol *pos;
int count = 0;
struct rb_root_cached old_root = dso->symbols;
@@ -770,6 +769,7 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
*root = RB_ROOT_CACHED;

while (next) {
+ struct map *curr_map;
struct dso *curr_map_dso;
char *module;

@@ -796,6 +796,7 @@ static int maps__split_kallsyms_for_kcore(struct maps *kmaps, struct dso *dso)
pos->end -= map__start(curr_map) - map__pgoff(curr_map);
symbols__insert(&curr_map_dso->symbols, pos);
++count;
+ map__put(curr_map);
}

/* Symbols have been adjusted */
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:47

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 40/50] perf maps: Get map before returning in maps__find_by_name

Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/vmlinux-kallsyms.c | 5 +++--
tools/perf/util/machine.c | 6 ++++--
tools/perf/util/maps.c | 6 +++---
tools/perf/util/probe-event.c | 1 +
tools/perf/util/symbol-elf.c | 4 +++-
tools/perf/util/symbol.c | 18 +++++++++++-------
6 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index e808e6fc8f76..fecbf851bb2e 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -131,9 +131,10 @@ static int test__vmlinux_matches_kallsyms_cb1(struct map *map, void *data)
struct map *pair = maps__find_by_name(args->kallsyms.kmaps,
(dso->kernel ? dso->short_name : dso->name));

- if (pair)
+ if (pair) {
map__set_priv(pair, 1);
- else {
+ map__put(pair);
+ } else {
if (!args->header_printed) {
pr_info("WARN: Maps only in vmlinux:\n");
args->header_printed = true;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 1112a9dbb21a..d6b3f84cb935 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1538,8 +1538,10 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
return 0;

long_name = strdup(path);
- if (long_name == NULL)
+ if (long_name == NULL) {
+ map__put(map);
return -ENOMEM;
+ }

dso = map__dso(map);
dso__set_long_name(dso, long_name, true);
@@ -1553,7 +1555,7 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo
dso->symtab_type++;
dso->comp = m->comp;
}
-
+ map__put(map);
return 0;
}

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 28facfdac1d7..8a8c1f216b86 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -885,7 +885,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
struct dso *dso = map__dso(maps__maps_by_name(maps)[i]);

if (dso && strcmp(dso->short_name, name) == 0) {
- result = maps__maps_by_name(maps)[i]; // TODO: map__get
+ result = map__get(maps__maps_by_name(maps)[i]);
done = true;
}
}
@@ -897,7 +897,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
sizeof(*mapp), map__strcmp_name);

if (mapp) {
- result = *mapp; // TODO: map__get
+ result = map__get(*mapp);
i = mapp - maps__maps_by_name(maps);
RC_CHK_ACCESS(maps)->last_search_by_name_idx = i;
}
@@ -922,7 +922,7 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
struct dso *dso = map__dso(pos);

if (dso && strcmp(dso->short_name, name) == 0) {
- result = pos; // TODO: map__get
+ result = map__get(pos);
break;
}
}
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index a1a796043691..be71abe8b9b0 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -358,6 +358,7 @@ static int kernel_get_module_dso(const char *module, struct dso **pdso)
map = maps__find_by_name(machine__kernel_maps(host_machine), module_name);
if (map) {
dso = map__dso(map);
+ map__put(map);
goto found;
}
pr_debug("Failed to find module %s.\n", module);
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 4b934ed3bfd1..5990e3fabdb5 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -1470,8 +1470,10 @@ static int dso__process_kernel_symbol(struct dso *dso, struct map *map,
dso__set_loaded(curr_dso);
*curr_mapp = curr_map;
*curr_dsop = curr_dso;
- } else
+ } else {
*curr_dsop = map__dso(curr_map);
+ map__put(curr_map);
+ }

return 0;
}
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index ad4819a24320..3f31e868d883 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -814,7 +814,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
struct map *initial_map)
{
struct machine *machine;
- struct map *curr_map = initial_map;
+ struct map *curr_map = map__get(initial_map);
struct symbol *pos;
int count = 0, moved = 0;
struct rb_root_cached *root = &dso->symbols;
@@ -858,13 +858,14 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
dso__set_loaded(curr_map_dso);
}

+ map__zput(curr_map);
curr_map = maps__find_by_name(kmaps, module);
if (curr_map == NULL) {
pr_debug("%s/proc/{kallsyms,modules} "
"inconsistency while looking "
"for \"%s\" module!\n",
machine->root_dir, module);
- curr_map = initial_map;
+ curr_map = map__get(initial_map);
goto discard_symbol;
}
curr_map_dso = map__dso(curr_map);
@@ -888,7 +889,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
* symbols at this point.
*/
goto discard_symbol;
- } else if (curr_map != initial_map) {
+ } else if (RC_CHK_ACCESS(curr_map) != RC_CHK_ACCESS(initial_map)) {
char dso_name[PATH_MAX];
struct dso *ndso;

@@ -899,7 +900,8 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
}

if (count == 0) {
- curr_map = initial_map;
+ map__zput(curr_map);
+ curr_map = map__get(initial_map);
goto add_symbol;
}

@@ -913,6 +915,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
kernel_range++);

ndso = dso__new(dso_name);
+ map__zput(curr_map);
if (ndso == NULL)
return -1;

@@ -926,6 +929,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,

map__set_mapping_type(curr_map, MAPPING_TYPE__IDENTITY);
if (maps__insert(kmaps, curr_map)) {
+ map__zput(curr_map);
dso__put(ndso);
return -1;
}
@@ -936,7 +940,7 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
pos->end -= delta;
}
add_symbol:
- if (curr_map != initial_map) {
+ if (RC_CHK_ACCESS(curr_map) != RC_CHK_ACCESS(initial_map)) {
struct dso *curr_map_dso = map__dso(curr_map);

rb_erase_cached(&pos->rb_node, root);
@@ -951,12 +955,12 @@ static int maps__split_kallsyms(struct maps *kmaps, struct dso *dso, u64 delta,
symbol__delete(pos);
}

- if (curr_map != initial_map &&
+ if (RC_CHK_ACCESS(curr_map) != RC_CHK_ACCESS(initial_map) &&
dso->kernel == DSO_SPACE__KERNEL_GUEST &&
machine__is_default_guest(maps__machine(kmaps))) {
dso__set_loaded(map__dso(curr_map));
}
-
+ map__put(curr_map);
return count + moved;
}

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:27:52

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 19/50] perf maps: Switch modules tree walk to io_dir__readdir

Compared to glibc's opendir/readdir this lowers the max RSS of perf
record by 1.8MB on a Debian machine.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a985d004aa8d..be3dab9d5253 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -36,6 +36,7 @@
#include <internal/lib.h> // page_size
#include "cgroup.h"
#include "arm64-frame-pointer-unwind-support.h"
+#include <api/io_dir.h>

#include <linux/ctype.h>
#include <symbol/kallsyms.h>
@@ -1552,25 +1553,21 @@ static int maps__set_module_path(struct maps *maps, const char *path, struct kmo

static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, int depth)
{
- struct dirent *dent;
- DIR *dir = opendir(dir_name);
+ struct io_dirent64 *dent;
+ struct io_dir iod;
int ret = 0;

- if (!dir) {
+ io_dir__init(&iod, open(dir_name, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (iod.dirfd < 0) {
pr_debug("%s: cannot open %s dir\n", __func__, dir_name);
return -1;
}

- while ((dent = readdir(dir)) != NULL) {
+ while ((dent = io_dir__readdir(&iod)) != NULL) {
char path[PATH_MAX];
- struct stat st;

- /*sshfs might return bad dent->d_type, so we have to stat*/
path__join(path, sizeof(path), dir_name, dent->d_name);
- if (stat(path, &st))
- continue;
-
- if (S_ISDIR(st.st_mode)) {
+ if (io_dir__is_dir(&iod, dent)) {
if (!strcmp(dent->d_name, ".") ||
!strcmp(dent->d_name, ".."))
continue;
@@ -1603,7 +1600,7 @@ static int maps__set_modules_path_dir(struct maps *maps, const char *dir_name, i
}

out:
- closedir(dir);
+ close(iod.dirfd);
return ret;
}

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:28:06

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 34/50] perf maps: Add maps__load_first

Avoid bpf_lock_contention_touching the internal maps data structure by
adding a helper function. As access is done directly on the map in
maps, hold the read lock to stop it being removed.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/bpf_lock_contention.c | 2 +-
tools/perf/util/maps.c | 13 +++++++++++++
tools/perf/util/maps.h | 2 ++
3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index e105245eb905..d9720a910330 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -317,7 +317,7 @@ int lock_contention_read(struct lock_contention *con)
}

/* make sure it loads the kernel map */
- map__load(maps__first(machine->kmaps)->map);
+ maps__load_first(machine->kmaps);

prev_key = NULL;
while (!bpf_map_get_next_key(fd, prev_key, &key)) {
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 85bea2a6dca9..9a84d26328a7 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -792,3 +792,16 @@ int maps__merge_in(struct maps *kmaps, struct map *new_map)
}
return err;
}
+
+void maps__load_first(struct maps *maps)
+{
+ struct map_rb_node *first;
+
+ down_read(maps__lock(maps));
+
+ first = maps__first(maps);
+ if (first)
+ map__load(first->map);
+
+ up_read(maps__lock(maps));
+}
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index e4a49d6ff5cf..b7ab3ec61b7c 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -142,4 +142,6 @@ void __maps__sort_by_name(struct maps *maps);

void maps__fixup_end(struct maps *maps);

+void maps__load_first(struct maps *maps);
+
#endif // __PERF_MAPS_H
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:28:07

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 17/50] tools api fs: Avoid reading whole file for a 1 byte bool

sysfs__read_bool used the first byte from a fully read file into a
string. It then looked at the first byte's value. Avoid doing this and
just read the first byte.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/lib/api/fs/fs.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 496812b5f1d2..4c35a689d1fc 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -447,15 +447,16 @@ int sysfs__read_str(const char *entry, char **buf, size_t *sizep)

int sysfs__read_bool(const char *entry, bool *value)
{
- char *buf;
- size_t size;
- int ret;
+ struct io io;
+ char bf[16];
+ int ret = 0;

- ret = sysfs__read_str(entry, &buf, &size);
- if (ret < 0)
- return ret;
+ io.fd = open(entry, O_RDONLY);
+ if (io.fd < 0)
+ return -errno;

- switch (buf[0]) {
+ io__init(&io, io.fd, bf, sizeof(bf));
+ switch (io__get_char(&io)) {
case '1':
case 'y':
case 'Y':
@@ -469,8 +470,7 @@ int sysfs__read_bool(const char *entry, bool *value)
default:
ret = -1;
}
-
- free(buf);
+ close(io.fd);

return ret;
}
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:28:39

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 41/50] perf maps: Get map before returning in maps__find_next_entry

Finding a map is done under a lock, returning the map without a
reference count means it can be removed without notice and causing
uses after free. Grab a reference count to the map within the lock
region and return this. Fix up locations that need a map__put
following this.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 4 +++-
tools/perf/util/maps.c | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d6b3f84cb935..42d73f00f9c1 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1758,8 +1758,10 @@ int machine__create_kernel_maps(struct machine *machine)
struct map *next = maps__find_next_entry(machine__kernel_maps(machine),
machine__kernel_map(machine));

- if (next)
+ if (next) {
machine__set_kernel_mmap(machine, start, map__start(next));
+ map__put(next);
+ }
}

out_put:
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 8a8c1f216b86..b3937e734cbf 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -942,7 +942,7 @@ struct map *maps__find_next_entry(struct maps *maps, struct map *map)
down_read(maps__lock(maps));
i = maps__by_address_index(maps, map);
if (i < maps__nr_maps(maps))
- result = maps__maps_by_address(maps)[i]; // TODO: map__get
+ result = map__get(maps__maps_by_address(maps)[i]);

up_read(maps__lock(maps));
return result;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:28:45

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 35/50] perf maps: Add find next entry to give entry after the given map

Use to remove map_rb_node use from machine.c.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 7 +++----
tools/perf/util/maps.c | 11 +++++++++++
tools/perf/util/maps.h | 2 ++
3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 191e492539e5..ab345604f274 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1759,12 +1759,11 @@ int machine__create_kernel_maps(struct machine *machine)

if (end == ~0ULL) {
/* update end address of the kernel map using adjacent module address */
- struct map_rb_node *rb_node = maps__find_node(machine__kernel_maps(machine),
- machine__kernel_map(machine));
- struct map_rb_node *next = map_rb_node__next(rb_node);
+ struct map *next = maps__find_next_entry(machine__kernel_maps(machine),
+ machine__kernel_map(machine));

if (next)
- machine__set_kernel_mmap(machine, start, map__start(next->map));
+ machine__set_kernel_mmap(machine, start, map__start(next));
}

out_put:
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 9a84d26328a7..38d56709bd5e 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -662,6 +662,17 @@ struct map *maps__find_by_name(struct maps *maps, const char *name)
return map;
}

+struct map *maps__find_next_entry(struct maps *maps, struct map *map)
+{
+ struct map_rb_node *rb_node = maps__find_node(maps, map);
+ struct map_rb_node *next = map_rb_node__next(rb_node);
+
+ if (next)
+ return next->map;
+
+ return NULL;
+}
+
void maps__fixup_end(struct maps *maps)
{
struct map_rb_node *prev = NULL, *curr;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index b7ab3ec61b7c..84b42c8456e8 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -136,6 +136,8 @@ int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new);

struct map *maps__find_by_name(struct maps *maps, const char *name);

+struct map *maps__find_next_entry(struct maps *maps, struct map *map);
+
int maps__merge_in(struct maps *kmaps, struct map *new_map);

void __maps__sort_by_name(struct maps *maps);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:28:50

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 47/50] perf machine: Move fprintf to for_each loop and a callback

Avoid exposing the threads data structure by switching to the callback
machine__for_each_thread approach. machine__fprintf is only used in
tests and verbose >3 output so don't turn to list and sort. Add
machine__threads_nr to be refactored later.

Note, all existing *_fprintf routines ignore fprintf errors.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 43 ++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6d7a505850c8..7e19303d1aa6 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1114,29 +1114,40 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp)
return printed;
}

-size_t machine__fprintf(struct machine *machine, FILE *fp)
+struct machine_fprintf_cb_args {
+ FILE *fp;
+ size_t printed;
+};
+
+static int machine_fprintf_cb(struct thread *thread, void *data)
{
- struct rb_node *nd;
- size_t ret;
- int i;
+ struct machine_fprintf_cb_args *args = data;

- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- struct threads *threads = &machine->threads[i];
+ /* TODO: handle fprintf errors. */
+ args->printed += thread__fprintf(thread, args->fp);
+ return 0;
+}

- down_read(&threads->lock);
+static size_t machine__threads_nr(const struct machine *machine)
+{
+ size_t nr = 0;

- ret = fprintf(fp, "Threads: %u\n", threads->nr);
+ for (int i = 0; i < THREADS__TABLE_SIZE; i++)
+ nr += machine->threads[i].nr;

- for (nd = rb_first_cached(&threads->entries); nd;
- nd = rb_next(nd)) {
- struct thread *pos = rb_entry(nd, struct thread_rb_node, rb_node)->thread;
+ return nr;
+}

- ret += thread__fprintf(pos, fp);
- }
+size_t machine__fprintf(struct machine *machine, FILE *fp)
+{
+ struct machine_fprintf_cb_args args = {
+ .fp = fp,
+ .printed = 0,
+ };
+ size_t ret = fprintf(fp, "Threads: %zu\n", machine__threads_nr(machine));

- up_read(&threads->lock);
- }
- return ret;
+ machine__for_each_thread(machine, machine_fprintf_cb, &args);
+ return ret + args.printed;
}

static struct dso *machine__get_kernel(struct machine *machine)
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:28:56

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 33/50] perf maps: Rename clone to copy from

Rename maps__clone to maps__copy_from to be more intention revealing
of its behavior. Pass the underlying maps rather than the thread.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/machine.c | 2 +-
tools/perf/util/maps.c | 6 +-----
tools/perf/util/maps.h | 3 +--
tools/perf/util/thread.c | 2 +-
4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 3c967295c9a3..191e492539e5 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -454,7 +454,7 @@ static struct thread *findnew_guest_code(struct machine *machine,
* Guest code can be found in hypervisor process at the same address
* so copy host maps.
*/
- err = maps__clone(thread, thread__maps(host_thread));
+ err = maps__copy_from(thread__maps(thread), thread__maps(host_thread));
thread__put(host_thread);
if (err)
goto out_err;
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 14e1a169433d..85bea2a6dca9 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -452,12 +452,8 @@ int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
return err;
}

-/*
- * XXX This should not really _copy_ te maps, but refcount them.
- */
-int maps__clone(struct thread *thread, struct maps *parent)
+int maps__copy_from(struct maps *maps, struct maps *parent)
{
- struct maps *maps = thread__maps(thread);
int err;
struct map_rb_node *rb_node;

diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 62e94d443c02..e4a49d6ff5cf 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -14,7 +14,6 @@ struct ref_reloc_sym;
struct machine;
struct map;
struct maps;
-struct thread;

struct map_rb_node {
struct rb_node rb_node;
@@ -61,7 +60,7 @@ struct kmap {

struct maps *maps__new(struct machine *machine);
bool maps__empty(struct maps *maps);
-int maps__clone(struct thread *thread, struct maps *parent);
+int maps__copy_from(struct maps *maps, struct maps *parent);

struct maps *maps__get(struct maps *maps);
void maps__put(struct maps *maps);
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index cab818af6787..07b158aa3e44 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -390,7 +390,7 @@ static int thread__clone_maps(struct thread *thread, struct thread *parent, bool
return 0;
}
/* But this one is new process, copy maps. */
- return do_maps_clone ? maps__clone(thread, thread__maps(parent)) : 0;
+ return do_maps_clone ? maps__copy_from(thread__maps(thread), thread__maps(parent)) : 0;
}

int thread__fork(struct thread *thread, struct thread *parent, u64 timestamp, bool do_maps_clone)
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:29:03

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 30/50] perf debug: Expose debug file

Some dumping call backs need to be passed a FILE*. Expose debug file
via an accessor API for a consistent way to do this. Catch the
unlikely failure of it not being set. Switch two cases where stderr
was being used instead of debug_file.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/debug.c | 22 +++++++++++++++-------
tools/perf/util/debug.h | 1 +
2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 88378c4c5dd9..e282b4ceb4d2 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -38,12 +38,21 @@ bool dump_trace = false, quiet = false;
int debug_ordered_events;
static int redirect_to_stderr;
int debug_data_convert;
-static FILE *debug_file;
+static FILE *_debug_file;
bool debug_display_time;

+FILE *debug_file(void)
+{
+ if (!_debug_file) {
+ pr_warning_once("debug_file not set");
+ debug_set_file(stderr);
+ }
+ return _debug_file;
+}
+
void debug_set_file(FILE *file)
{
- debug_file = file;
+ _debug_file = file;
}

void debug_set_display_time(bool set)
@@ -78,8 +87,8 @@ int veprintf(int level, int var, const char *fmt, va_list args)
if (use_browser >= 1 && !redirect_to_stderr) {
ui_helpline__vshow(fmt, args);
} else {
- ret = fprintf_time(debug_file);
- ret += vfprintf(debug_file, fmt, args);
+ ret = fprintf_time(debug_file());
+ ret += vfprintf(debug_file(), fmt, args);
}
}

@@ -107,9 +116,8 @@ static int veprintf_time(u64 t, const char *fmt, va_list args)
nsecs -= secs * NSEC_PER_SEC;
usecs = nsecs / NSEC_PER_USEC;

- ret = fprintf(stderr, "[%13" PRIu64 ".%06" PRIu64 "] ",
- secs, usecs);
- ret += vfprintf(stderr, fmt, args);
+ ret = fprintf(debug_file(), "[%13" PRIu64 ".%06" PRIu64 "] ", secs, usecs);
+ ret += vfprintf(debug_file(), fmt, args);
return ret;
}

diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index f99468a7f681..de8870980d44 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -77,6 +77,7 @@ int eprintf_time(int level, int var, u64 t, const char *fmt, ...) __printf(4, 5)
int veprintf(int level, int var, const char *fmt, va_list args);

int perf_debug_option(const char *str);
+FILE *debug_file(void);
void debug_set_file(FILE *file);
void debug_set_display_time(bool set);
void perf_debug_setup(void);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:29:06

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 45/50] perf report: Sort child tasks by tid

Commit 91e467bc568f ("perf machine: Use hashtable for machine
threads") made the iteration of thread tids unordered. The perf report
--tasks output now shows child threads in an order determined by the
hashing. For example, in this snippet tid 3 appears after tid 256 even
though they have the same ppid 2:

```
$ perf report --tasks
% pid tid ppid comm
0 0 -1 |swapper
2 2 0 | kthreadd
256 256 2 | kworker/12:1H-k
693761 693761 2 | kworker/10:1-mm
1301762 1301762 2 | kworker/1:1-mm_
1302530 1302530 2 | kworker/u32:0-k
3 3 2 | rcu_gp
...
```

The output is easier to read if threads appear numerically
increasing. To allow for this, read all threads into a list then sort
with a comparator that orders by the child task's of the first common
parent. The list creation and deletion are created as utilities on
machine. The indentation is possible by counting the number of
parents a child has.

With this change the output for the same data file is now like:
```
$ perf report --tasks
% pid tid ppid comm
0 0 -1 |swapper
1 1 0 | systemd
823 823 1 | systemd-journal
853 853 1 | systemd-udevd
3230 3230 1 | systemd-timesyn
3236 3236 1 | auditd
3239 3239 3236 | audisp-syslog
3321 3321 1 | accounts-daemon
...
```

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-report.c | 203 ++++++++++++++++++++----------------
tools/perf/util/machine.c | 30 ++++++
tools/perf/util/machine.h | 10 ++
3 files changed, 155 insertions(+), 88 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d7f1f3c8cb7d..3731e9bf7000 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -59,6 +59,7 @@
#include <linux/ctype.h>
#include <signal.h>
#include <linux/bitmap.h>
+#include <linux/list_sort.h>
#include <linux/string.h>
#include <linux/stringify.h>
#include <linux/time64.h>
@@ -815,35 +816,6 @@ static void tasks_setup(struct report *rep)
rep->tool.no_warn = true;
}

-struct task {
- struct thread *thread;
- struct list_head list;
- struct list_head children;
-};
-
-static struct task *tasks_list(struct task *task, struct machine *machine)
-{
- struct thread *parent_thread, *thread = task->thread;
- struct task *parent_task;
-
- /* Already listed. */
- if (!list_empty(&task->list))
- return NULL;
-
- /* Last one in the chain. */
- if (thread__ppid(thread) == -1)
- return task;
-
- parent_thread = machine__find_thread(machine, -1, thread__ppid(thread));
- if (!parent_thread)
- return ERR_PTR(-ENOENT);
-
- parent_task = thread__priv(parent_thread);
- thread__put(parent_thread);
- list_add_tail(&task->list, &parent_task->children);
- return tasks_list(parent_task, machine);
-}
-
struct maps__fprintf_task_args {
int indent;
FILE *fp;
@@ -887,89 +859,144 @@ static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp)
return args.printed;
}

-static void task__print_level(struct task *task, FILE *fp, int level)
+static int thread_level(struct machine *machine, const struct thread *thread)
{
- struct thread *thread = task->thread;
- struct task *child;
- int comm_indent = fprintf(fp, " %8d %8d %8d |%*s",
- thread__pid(thread), thread__tid(thread),
- thread__ppid(thread), level, "");
+ struct thread *parent_thread;
+ int res;

- fprintf(fp, "%s\n", thread__comm_str(thread));
+ if (thread__tid(thread) <= 0)
+ return 0;

- maps__fprintf_task(thread__maps(thread), comm_indent, fp);
+ if (thread__ppid(thread) <= 0)
+ return 1;

- if (!list_empty(&task->children)) {
- list_for_each_entry(child, &task->children, list)
- task__print_level(child, fp, level + 1);
+ parent_thread = machine__find_thread(machine, -1, thread__ppid(thread));
+ if (!parent_thread) {
+ pr_err("Missing parent thread of %d\n", thread__tid(thread));
+ return 0;
}
+ res = 1 + thread_level(machine, parent_thread);
+ thread__put(parent_thread);
+ return res;
}

-static int tasks_print(struct report *rep, FILE *fp)
+static void task__print_level(struct machine *machine, struct thread *thread, FILE *fp)
{
- struct perf_session *session = rep->session;
- struct machine *machine = &session->machines.host;
- struct task *tasks, *task;
- unsigned int nr = 0, itask = 0, i;
- struct rb_node *nd;
- LIST_HEAD(list);
+ int level = thread_level(machine, thread);
+ int comm_indent = fprintf(fp, " %8d %8d %8d |%*s",
+ thread__pid(thread), thread__tid(thread),
+ thread__ppid(thread), level, "");

- /*
- * No locking needed while accessing machine->threads,
- * because --tasks is single threaded command.
- */
+ fprintf(fp, "%s\n", thread__comm_str(thread));

- /* Count all the threads. */
- for (i = 0; i < THREADS__TABLE_SIZE; i++)
- nr += machine->threads[i].nr;
+ maps__fprintf_task(thread__maps(thread), comm_indent, fp);
+}

- tasks = malloc(sizeof(*tasks) * nr);
- if (!tasks)
- return -ENOMEM;
+static int task_list_cmp(void *priv, const struct list_head *la, const struct list_head *lb)
+{
+ struct machine *machine = priv;
+ struct thread_list *task_a = list_entry(la, struct thread_list, list);
+ struct thread_list *task_b = list_entry(lb, struct thread_list, list);
+ struct thread *a = task_a->thread;
+ struct thread *b = task_b->thread;
+ int level_a, level_b, res;
+
+ /* Compare a and b to root. */
+ if (thread__tid(a) == thread__tid(b))
+ return 0;

- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- struct threads *threads = &machine->threads[i];
+ if (thread__tid(a) == 0)
+ return -1;

- for (nd = rb_first_cached(&threads->entries); nd;
- nd = rb_next(nd)) {
- task = tasks + itask++;
+ if (thread__tid(b) == 0)
+ return 1;

- task->thread = rb_entry(nd, struct thread_rb_node, rb_node)->thread;
- INIT_LIST_HEAD(&task->children);
- INIT_LIST_HEAD(&task->list);
- thread__set_priv(task->thread, task);
- }
+ /* If parents match sort by tid. */
+ if (thread__ppid(a) == thread__ppid(b)) {
+ return thread__tid(a) < thread__tid(b)
+ ? -1
+ : (thread__tid(a) > thread__tid(b) ? 1 : 0);
}

/*
- * Iterate every task down to the unprocessed parent
- * and link all in task children list. Task with no
- * parent is added into 'list'.
+ * Find a and b such that if they are a child of each other a and b's
+ * tid's match, otherwise a and b have a common parent and distinct
+ * tid's to sort by. First make the depths of the threads match.
*/
- for (itask = 0; itask < nr; itask++) {
- task = tasks + itask;
-
- if (!list_empty(&task->list))
- continue;
-
- task = tasks_list(task, machine);
- if (IS_ERR(task)) {
- pr_err("Error: failed to process tasks\n");
- free(tasks);
- return PTR_ERR(task);
+ level_a = thread_level(machine, a);
+ level_b = thread_level(machine, b);
+ a = thread__get(a);
+ b = thread__get(b);
+ for (int i = level_a; i > level_b; i--) {
+ struct thread *parent = machine__find_thread(machine, -1, thread__ppid(a));
+
+ thread__put(a);
+ if (!parent) {
+ pr_err("Missing parent thread of %d\n", thread__tid(a));
+ thread__put(b);
+ return -1;
}
+ a = parent;
+ }
+ for (int i = level_b; i > level_a; i--) {
+ struct thread *parent = machine__find_thread(machine, -1, thread__ppid(b));

- if (task)
- list_add_tail(&task->list, &list);
+ thread__put(b);
+ if (!parent) {
+ pr_err("Missing parent thread of %d\n", thread__tid(b));
+ thread__put(a);
+ return 1;
+ }
+ b = parent;
+ }
+ /* Search up to a common parent. */
+ while (thread__ppid(a) != thread__ppid(b)) {
+ struct thread *parent;
+
+ parent = machine__find_thread(machine, -1, thread__ppid(a));
+ thread__put(a);
+ if (!parent)
+ pr_err("Missing parent thread of %d\n", thread__tid(a));
+ a = parent;
+ parent = machine__find_thread(machine, -1, thread__ppid(b));
+ thread__put(b);
+ if (!parent)
+ pr_err("Missing parent thread of %d\n", thread__tid(b));
+ b = parent;
+ if (!a || !b)
+ return !a && !b ? 0 : (!a ? -1 : 1);
+ }
+ if (thread__tid(a) == thread__tid(b)) {
+ /* a is a child of b or vice-versa, deeper levels appear later. */
+ res = level_a < level_b ? -1 : (level_a > level_b ? 1 : 0);
+ } else {
+ /* Sort by tid now the parent is the same. */
+ res = thread__tid(a) < thread__tid(b) ? -1 : 1;
}
+ thread__put(a);
+ thread__put(b);
+ return res;
+}
+
+static int tasks_print(struct report *rep, FILE *fp)
+{
+ struct machine *machine = &rep->session->machines.host;
+ LIST_HEAD(tasks);
+ int ret;

- fprintf(fp, "# %8s %8s %8s %s\n", "pid", "tid", "ppid", "comm");
+ ret = machine__thread_list(machine, &tasks);
+ if (!ret) {
+ struct thread_list *task;

- list_for_each_entry(task, &list, list)
- task__print_level(task, fp, 0);
+ list_sort(machine, &tasks, task_list_cmp);

- free(tasks);
- return 0;
+ fprintf(fp, "# %8s %8s %8s %s\n", "pid", "tid", "ppid", "comm");
+
+ list_for_each_entry(task, &tasks, list)
+ task__print_level(machine, task->thread, fp);
+ }
+ thread_list__delete(&tasks);
+ return ret;
}

static int __cmd_report(struct report *rep)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f9c77119af22..6d7a505850c8 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -3258,6 +3258,36 @@ int machines__for_each_thread(struct machines *machines,
return rc;
}

+
+static int thread_list_cb(struct thread *thread, void *data)
+{
+ struct list_head *list = data;
+ struct thread_list *entry = malloc(sizeof(*entry));
+
+ if (!entry)
+ return -ENOMEM;
+
+ entry->thread = thread__get(thread);
+ list_add_tail(&entry->list, list);
+ return 0;
+}
+
+int machine__thread_list(struct machine *machine, struct list_head *list)
+{
+ return machine__for_each_thread(machine, thread_list_cb, list);
+}
+
+void thread_list__delete(struct list_head *list)
+{
+ struct thread_list *pos, *next;
+
+ list_for_each_entry_safe(pos, next, list, list) {
+ thread__zput(pos->thread);
+ list_del(&pos->list);
+ free(pos);
+ }
+}
+
pid_t machine__get_current_tid(struct machine *machine, int cpu)
{
if (cpu < 0 || (size_t)cpu >= machine->current_tid_sz)
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 1279acda6a8a..b738ce84817b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -280,6 +280,16 @@ int machines__for_each_thread(struct machines *machines,
int (*fn)(struct thread *thread, void *p),
void *priv);

+struct thread_list {
+ struct list_head list;
+ struct thread *thread;
+};
+
+/* Make a list of struct thread_list based on threads in the machine. */
+int machine__thread_list(struct machine *machine, struct list_head *list);
+/* Free up the nodes within the thread_list list. */
+void thread_list__delete(struct list_head *list);
+
pid_t machine__get_current_tid(struct machine *machine, int cpu);
int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
pid_t tid);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:29:18

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 46/50] perf trace: Ignore thread hashing in summary

Commit 91e467bc568f ("perf machine: Use hashtable for machine
threads") made the iteration of thread tids unordered. The perf trace
--summary output sorts and prints each hash bucket, rather than all
threads globally. Change this behavior by turn all threads into a
list, sort the list by number of trace events then by tids, finally
print the list. This also allows the rbtree in threads to be not
accessed outside of machine.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-trace.c | 41 +++++++++++++++++++++----------------
tools/perf/util/rb_resort.h | 5 -----
2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index e541d0e2777a..e9ff78b331fe 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -74,6 +74,7 @@
#include <linux/err.h>
#include <linux/filter.h>
#include <linux/kernel.h>
+#include <linux/list_sort.h>
#include <linux/random.h>
#include <linux/stringify.h>
#include <linux/time64.h>
@@ -4314,34 +4315,38 @@ static unsigned long thread__nr_events(struct thread_trace *ttrace)
return ttrace ? ttrace->nr_events : 0;
}

-DEFINE_RESORT_RB(threads,
- (thread__nr_events(thread__priv(a->thread)) <
- thread__nr_events(thread__priv(b->thread))),
- struct thread *thread;
-)
+static int trace_nr_events_cmp(void *priv __maybe_unused,
+ const struct list_head *la,
+ const struct list_head *lb)
{
- entry->thread = rb_entry(nd, struct thread_rb_node, rb_node)->thread;
+ struct thread_list *a = list_entry(la, struct thread_list, list);
+ struct thread_list *b = list_entry(lb, struct thread_list, list);
+ unsigned long a_nr_events = thread__nr_events(thread__priv(a->thread));
+ unsigned long b_nr_events = thread__nr_events(thread__priv(b->thread));
+
+ if (a_nr_events != b_nr_events)
+ return a_nr_events < b_nr_events ? -1 : 1;
+
+ /* Identical number of threads, place smaller tids first. */
+ return thread__tid(a->thread) < thread__tid(b->thread)
+ ? -1
+ : (thread__tid(a->thread) > thread__tid(b->thread) ? 1 : 0);
}

static size_t trace__fprintf_thread_summary(struct trace *trace, FILE *fp)
{
size_t printed = trace__fprintf_threads_header(fp);
- struct rb_node *nd;
- int i;
-
- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- DECLARE_RESORT_RB_MACHINE_THREADS(threads, trace->host, i);
+ LIST_HEAD(threads);

- if (threads == NULL) {
- fprintf(fp, "%s", "Error sorting output by nr_events!\n");
- return 0;
- }
+ if (machine__thread_list(trace->host, &threads) == 0) {
+ struct thread_list *pos;

- resort_rb__for_each_entry(nd, threads)
- printed += trace__fprintf_thread(fp, threads_entry->thread, trace);
+ list_sort(NULL, &threads, trace_nr_events_cmp);

- resort_rb__delete(threads);
+ list_for_each_entry(pos, &threads, list)
+ printed += trace__fprintf_thread(fp, pos->thread, trace);
}
+ thread_list__delete(&threads);
return printed;
}

diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index 376e86cb4c3c..d927a0d25052 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -143,9 +143,4 @@ struct __name##_sorted *__name = __name##_sorted__new
DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries.rb_root, \
__ilist->rblist.nr_entries)

-/* For 'struct machine->threads' */
-#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, hash_bucket) \
- DECLARE_RESORT_RB(__name)(&__machine->threads[hash_bucket].entries.rb_root, \
- __machine->threads[hash_bucket].nr)
-
#endif /* _PERF_RESORT_RB_H_ */
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:29:20

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 48/50] perf threads: Move threads to its own files

Move threads out of machine and move thread_rb_node into the C
file. This hides the implementation of threads from the rest of the
code allowing for it to be refactored.

Locking discipline is tightened up in this change.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/Build | 1 +
tools/perf/util/bpf_lock_contention.c | 8 +-
tools/perf/util/machine.c | 287 ++++----------------------
tools/perf/util/machine.h | 20 +-
tools/perf/util/thread.c | 2 +-
tools/perf/util/thread.h | 6 -
tools/perf/util/threads.c | 244 ++++++++++++++++++++++
tools/perf/util/threads.h | 35 ++++
8 files changed, 325 insertions(+), 278 deletions(-)
create mode 100644 tools/perf/util/threads.c
create mode 100644 tools/perf/util/threads.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 96058f949ec9..2cb39c3cf46d 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -71,6 +71,7 @@ perf-y += ordered-events.o
perf-y += namespaces.o
perf-y += comm.o
perf-y += thread.o
+perf-y += threads.o
perf-y += thread_map.o
perf-y += parse-events-flex.o
perf-y += parse-events-bison.o
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index d9720a910330..52bbd6db2831 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -209,7 +209,7 @@ static const char *lock_contention_get_name(struct lock_contention *con,

/* do not update idle comm which contains CPU number */
if (pid) {
- struct thread *t = __machine__findnew_thread(machine, /*pid=*/-1, pid);
+ struct thread *t = machine__findnew_thread(machine, /*pid=*/-1, pid);

if (t == NULL)
return name;
@@ -301,9 +301,9 @@ int lock_contention_read(struct lock_contention *con)
return -1;

if (con->aggr_mode == LOCK_AGGR_TASK) {
- struct thread *idle = __machine__findnew_thread(machine,
- /*pid=*/0,
- /*tid=*/0);
+ struct thread *idle = machine__findnew_thread(machine,
+ /*pid=*/0,
+ /*tid=*/0);
thread__set_comm(idle, "swapper", /*timestamp=*/0);
}

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 7e19303d1aa6..36231b5a86aa 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -44,9 +44,6 @@
#include <linux/string.h>
#include <linux/zalloc.h>

-static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
- struct thread *th, bool lock);
-
static struct dso *machine__kernel_dso(struct machine *machine)
{
return map__dso(machine->vmlinux_map);
@@ -59,35 +56,6 @@ static void dsos__init(struct dsos *dsos)
init_rwsem(&dsos->lock);
}

-static void machine__threads_init(struct machine *machine)
-{
- int i;
-
- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- struct threads *threads = &machine->threads[i];
- threads->entries = RB_ROOT_CACHED;
- init_rwsem(&threads->lock);
- threads->nr = 0;
- threads->last_match = NULL;
- }
-}
-
-static int thread_rb_node__cmp_tid(const void *key, const struct rb_node *nd)
-{
- int to_find = (int) *((pid_t *)key);
-
- return to_find - (int)thread__tid(rb_entry(nd, struct thread_rb_node, rb_node)->thread);
-}
-
-static struct thread_rb_node *thread_rb_node__find(const struct thread *th,
- struct rb_root *tree)
-{
- pid_t to_find = thread__tid(th);
- struct rb_node *nd = rb_find(&to_find, tree, thread_rb_node__cmp_tid);
-
- return rb_entry(nd, struct thread_rb_node, rb_node);
-}
-
static int machine__set_mmap_name(struct machine *machine)
{
if (machine__is_host(machine))
@@ -121,7 +89,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
RB_CLEAR_NODE(&machine->rb_node);
dsos__init(&machine->dsos);

- machine__threads_init(machine);
+ threads__init(&machine->threads);

machine->vdso_info = NULL;
machine->env = NULL;
@@ -222,27 +190,11 @@ static void dsos__exit(struct dsos *dsos)

void machine__delete_threads(struct machine *machine)
{
- struct rb_node *nd;
- int i;
-
- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- struct threads *threads = &machine->threads[i];
- down_write(&threads->lock);
- nd = rb_first_cached(&threads->entries);
- while (nd) {
- struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-
- nd = rb_next(nd);
- __machine__remove_thread(machine, trb, trb->thread, false);
- }
- up_write(&threads->lock);
- }
+ threads__remove_all_threads(&machine->threads);
}

void machine__exit(struct machine *machine)
{
- int i;
-
if (machine == NULL)
return;

@@ -255,12 +207,7 @@ void machine__exit(struct machine *machine)
zfree(&machine->current_tid);
zfree(&machine->kallsyms_filename);

- machine__delete_threads(machine);
- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- struct threads *threads = &machine->threads[i];
-
- exit_rwsem(&threads->lock);
- }
+ threads__exit(&machine->threads);
}

void machine__delete(struct machine *machine)
@@ -527,7 +474,7 @@ static void machine__update_thread_pid(struct machine *machine,
if (thread__pid(th) == thread__tid(th))
return;

- leader = __machine__findnew_thread(machine, thread__pid(th), thread__pid(th));
+ leader = machine__findnew_thread(machine, thread__pid(th), thread__pid(th));
if (!leader)
goto out_err;

@@ -561,160 +508,55 @@ static void machine__update_thread_pid(struct machine *machine,
goto out_put;
}

-/*
- * Front-end cache - TID lookups come in blocks,
- * so most of the time we dont have to look up
- * the full rbtree:
- */
-static struct thread*
-__threads__get_last_match(struct threads *threads, struct machine *machine,
- int pid, int tid)
-{
- struct thread *th;
-
- th = threads->last_match;
- if (th != NULL) {
- if (thread__tid(th) == tid) {
- machine__update_thread_pid(machine, th, pid);
- return thread__get(th);
- }
- thread__put(threads->last_match);
- threads->last_match = NULL;
- }
-
- return NULL;
-}
-
-static struct thread*
-threads__get_last_match(struct threads *threads, struct machine *machine,
- int pid, int tid)
-{
- struct thread *th = NULL;
-
- if (perf_singlethreaded)
- th = __threads__get_last_match(threads, machine, pid, tid);
-
- return th;
-}
-
-static void
-__threads__set_last_match(struct threads *threads, struct thread *th)
-{
- thread__put(threads->last_match);
- threads->last_match = thread__get(th);
-}
-
-static void
-threads__set_last_match(struct threads *threads, struct thread *th)
-{
- if (perf_singlethreaded)
- __threads__set_last_match(threads, th);
-}
-
/*
* Caller must eventually drop thread->refcnt returned with a successful
* lookup/new thread inserted.
*/
-static struct thread *____machine__findnew_thread(struct machine *machine,
- struct threads *threads,
- pid_t pid, pid_t tid,
- bool create)
+static struct thread *__machine__findnew_thread(struct machine *machine,
+ pid_t pid,
+ pid_t tid,
+ bool create)
{
- struct rb_node **p = &threads->entries.rb_root.rb_node;
- struct rb_node *parent = NULL;
- struct thread *th;
- struct thread_rb_node *nd;
- bool leftmost = true;
+ struct thread *th = threads__find(&machine->threads, tid);
+ bool created;

- th = threads__get_last_match(threads, machine, pid, tid);
- if (th)
+ if (th) {
+ machine__update_thread_pid(machine, th, pid);
return th;
-
- while (*p != NULL) {
- parent = *p;
- th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
-
- if (thread__tid(th) == tid) {
- threads__set_last_match(threads, th);
- machine__update_thread_pid(machine, th, pid);
- return thread__get(th);
- }
-
- if (tid < thread__tid(th))
- p = &(*p)->rb_left;
- else {
- p = &(*p)->rb_right;
- leftmost = false;
- }
}
-
if (!create)
return NULL;

- th = thread__new(pid, tid);
- if (th == NULL)
- return NULL;
-
- nd = malloc(sizeof(*nd));
- if (nd == NULL) {
- thread__put(th);
- return NULL;
- }
- nd->thread = th;
-
- rb_link_node(&nd->rb_node, parent, p);
- rb_insert_color_cached(&nd->rb_node, &threads->entries, leftmost);
- /*
- * We have to initialize maps separately after rb tree is updated.
- *
- * The reason is that we call machine__findnew_thread within
- * thread__init_maps to find the thread leader and that would screwed
- * the rb tree.
- */
- if (thread__init_maps(th, machine)) {
- pr_err("Thread init failed thread %d\n", pid);
- rb_erase_cached(&nd->rb_node, &threads->entries);
- RB_CLEAR_NODE(&nd->rb_node);
- free(nd);
- thread__put(th);
- return NULL;
- }
- /*
- * It is now in the rbtree, get a ref
- */
- threads__set_last_match(threads, th);
- ++threads->nr;
-
- return thread__get(th);
-}
+ th = threads__findnew(&machine->threads, pid, tid, &created);
+ if (created) {
+ /*
+ * We have to initialize maps separately after rb tree is
+ * updated.
+ *
+ * The reason is that we call machine__findnew_thread within
+ * thread__init_maps to find the thread leader and that would
+ * screwed the rb tree.
+ */
+ if (thread__init_maps(th, machine)) {
+ pr_err("Thread init failed thread %d\n", pid);
+ threads__remove(&machine->threads, th);
+ thread__put(th);
+ return NULL;
+ }
+ } else
+ machine__update_thread_pid(machine, th, pid);

-struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid)
-{
- return ____machine__findnew_thread(machine, machine__threads(machine, tid), pid, tid, true);
+ return th;
}

-struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
- pid_t tid)
+struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid)
{
- struct threads *threads = machine__threads(machine, tid);
- struct thread *th;
-
- down_write(&threads->lock);
- th = __machine__findnew_thread(machine, pid, tid);
- up_write(&threads->lock);
- return th;
+ return __machine__findnew_thread(machine, pid, tid, /*create=*/true);
}

-struct thread *machine__find_thread(struct machine *machine, pid_t pid,
- pid_t tid)
+struct thread *machine__find_thread(struct machine *machine, pid_t pid, pid_t tid)
{
- struct threads *threads = machine__threads(machine, tid);
- struct thread *th;
-
- down_read(&threads->lock);
- th = ____machine__findnew_thread(machine, threads, pid, tid, false);
- up_read(&threads->lock);
- return th;
+ return __machine__findnew_thread(machine, pid, tid, /*create=*/false);
}

/*
@@ -1128,23 +970,13 @@ static int machine_fprintf_cb(struct thread *thread, void *data)
return 0;
}

-static size_t machine__threads_nr(const struct machine *machine)
-{
- size_t nr = 0;
-
- for (int i = 0; i < THREADS__TABLE_SIZE; i++)
- nr += machine->threads[i].nr;
-
- return nr;
-}
-
size_t machine__fprintf(struct machine *machine, FILE *fp)
{
struct machine_fprintf_cb_args args = {
.fp = fp,
.printed = 0,
};
- size_t ret = fprintf(fp, "Threads: %zu\n", machine__threads_nr(machine));
+ size_t ret = fprintf(fp, "Threads: %zu\n", threads__nr(&machine->threads));

machine__for_each_thread(machine, machine_fprintf_cb, &args);
return ret + args.printed;
@@ -2066,36 +1898,9 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
return 0;
}

-static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
- struct thread *th, bool lock)
-{
- struct threads *threads = machine__threads(machine, thread__tid(th));
-
- if (!nd)
- nd = thread_rb_node__find(th, &threads->entries.rb_root);
-
- if (threads->last_match && RC_CHK_EQUAL(threads->last_match, th))
- threads__set_last_match(threads, NULL);
-
- if (lock)
- down_write(&threads->lock);
-
- BUG_ON(refcount_read(thread__refcnt(th)) == 0);
-
- thread__put(nd->thread);
- rb_erase_cached(&nd->rb_node, &threads->entries);
- RB_CLEAR_NODE(&nd->rb_node);
- --threads->nr;
-
- free(nd);
-
- if (lock)
- up_write(&threads->lock);
-}
-
void machine__remove_thread(struct machine *machine, struct thread *th)
{
- return __machine__remove_thread(machine, NULL, th, true);
+ return threads__remove(&machine->threads, th);
}

int machine__process_fork_event(struct machine *machine, union perf_event *event,
@@ -3229,23 +3034,7 @@ int machine__for_each_thread(struct machine *machine,
int (*fn)(struct thread *thread, void *p),
void *priv)
{
- struct threads *threads;
- struct rb_node *nd;
- int rc = 0;
- int i;
-
- for (i = 0; i < THREADS__TABLE_SIZE; i++) {
- threads = &machine->threads[i];
- for (nd = rb_first_cached(&threads->entries); nd;
- nd = rb_next(nd)) {
- struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-
- rc = fn(trb->thread, priv);
- if (rc != 0)
- return rc;
- }
- }
- return rc;
+ return threads__for_each_thread(&machine->threads, fn, priv);
}

int machines__for_each_thread(struct machines *machines,
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index b738ce84817b..e28c787616fe 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -7,6 +7,7 @@
#include "maps.h"
#include "dsos.h"
#include "rwsem.h"
+#include "threads.h"

struct addr_location;
struct branch_stack;
@@ -28,16 +29,6 @@ extern const char *ref_reloc_sym_names[];

struct vdso_info;

-#define THREADS__TABLE_BITS 8
-#define THREADS__TABLE_SIZE (1 << THREADS__TABLE_BITS)
-
-struct threads {
- struct rb_root_cached entries;
- struct rw_semaphore lock;
- unsigned int nr;
- struct thread *last_match;
-};
-
struct machine {
struct rb_node rb_node;
pid_t pid;
@@ -48,7 +39,7 @@ struct machine {
char *root_dir;
char *mmap_name;
char *kallsyms_filename;
- struct threads threads[THREADS__TABLE_SIZE];
+ struct threads threads;
struct vdso_info *vdso_info;
struct perf_env *env;
struct dsos dsos;
@@ -69,12 +60,6 @@ struct machine {
bool trampolines_mapped;
};

-static inline struct threads *machine__threads(struct machine *machine, pid_t tid)
-{
- /* Cast it to handle tid == -1 */
- return &machine->threads[(unsigned int)tid % THREADS__TABLE_SIZE];
-}
-
/*
* The main kernel (vmlinux) map
*/
@@ -220,7 +205,6 @@ bool machine__is(struct machine *machine, const char *arch);
bool machine__normalized_is(struct machine *machine, const char *arch);
int machine__nr_cpus_avail(struct machine *machine);

-struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);
struct thread *machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid);

struct dso *machine__findnew_dso_id(struct machine *machine, const char *filename, struct dso_id *id);
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index c59ab4d79163..1aa8962dcf52 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -26,7 +26,7 @@ int thread__init_maps(struct thread *thread, struct machine *machine)
if (pid == thread__tid(thread) || pid == -1) {
thread__set_maps(thread, maps__new(machine));
} else {
- struct thread *leader = __machine__findnew_thread(machine, pid, pid);
+ struct thread *leader = machine__findnew_thread(machine, pid, pid);

if (leader) {
thread__set_maps(thread, maps__get(thread__maps(leader)));
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 0df775b5c110..4b8f3e9e513b 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -3,7 +3,6 @@
#define __PERF_THREAD_H

#include <linux/refcount.h>
-#include <linux/rbtree.h>
#include <linux/list.h>
#include <stdio.h>
#include <unistd.h>
@@ -30,11 +29,6 @@ struct lbr_stitch {
struct callchain_cursor_node *prev_lbr_cursor;
};

-struct thread_rb_node {
- struct rb_node rb_node;
- struct thread *thread;
-};
-
DECLARE_RC_STRUCT(thread) {
/** @maps: mmaps associated with this thread. */
struct maps *maps;
diff --git a/tools/perf/util/threads.c b/tools/perf/util/threads.c
new file mode 100644
index 000000000000..d984ec939c7b
--- /dev/null
+++ b/tools/perf/util/threads.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "threads.h"
+#include "machine.h"
+#include "thread.h"
+
+struct thread_rb_node {
+ struct rb_node rb_node;
+ struct thread *thread;
+};
+
+static struct threads_table_entry *threads__table(struct threads *threads, pid_t tid)
+{
+ /* Cast it to handle tid == -1 */
+ return &threads->table[(unsigned int)tid % THREADS__TABLE_SIZE];
+}
+
+void threads__init(struct threads *threads)
+{
+ for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+ struct threads_table_entry *table = &threads->table[i];
+
+ table->entries = RB_ROOT_CACHED;
+ init_rwsem(&table->lock);
+ table->nr = 0;
+ table->last_match = NULL;
+ }
+}
+
+void threads__exit(struct threads *threads)
+{
+ threads__remove_all_threads(threads);
+ for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+ struct threads_table_entry *table = &threads->table[i];
+
+ exit_rwsem(&table->lock);
+ }
+}
+
+size_t threads__nr(struct threads *threads)
+{
+ size_t nr = 0;
+
+ for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+ struct threads_table_entry *table = &threads->table[i];
+
+ down_read(&table->lock);
+ nr += table->nr;
+ up_read(&table->lock);
+ }
+ return nr;
+}
+
+/*
+ * Front-end cache - TID lookups come in blocks,
+ * so most of the time we dont have to look up
+ * the full rbtree:
+ */
+static struct thread *__threads_table_entry__get_last_match(struct threads_table_entry *table,
+ pid_t tid)
+{
+ struct thread *th, *res = NULL;
+
+ th = table->last_match;
+ if (th != NULL) {
+ if (thread__tid(th) == tid)
+ res = thread__get(th);
+ }
+ return res;
+}
+
+static void __threads_table_entry__set_last_match(struct threads_table_entry *table,
+ struct thread *th)
+{
+ thread__put(table->last_match);
+ table->last_match = thread__get(th);
+}
+
+static void threads_table_entry__set_last_match(struct threads_table_entry *table,
+ struct thread *th)
+{
+ down_write(&table->lock);
+ __threads_table_entry__set_last_match(table, th);
+ up_write(&table->lock);
+}
+
+struct thread *threads__find(struct threads *threads, pid_t tid)
+{
+ struct threads_table_entry *table = threads__table(threads, tid);
+ struct rb_node **p;
+ struct thread *res = NULL;
+
+ down_read(&table->lock);
+ res = __threads_table_entry__get_last_match(table, tid);
+ if (res)
+ return res;
+
+ p = &table->entries.rb_root.rb_node;
+ while (*p != NULL) {
+ struct rb_node *parent = *p;
+ struct thread *th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
+
+ if (thread__tid(th) == tid) {
+ res = thread__get(th);
+ break;
+ }
+
+ if (tid < thread__tid(th))
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ up_read(&table->lock);
+ if (res)
+ threads_table_entry__set_last_match(table, res);
+ return res;
+}
+
+struct thread *threads__findnew(struct threads *threads, pid_t pid, pid_t tid, bool *created)
+{
+ struct threads_table_entry *table = threads__table(threads, tid);
+ struct rb_node **p;
+ struct rb_node *parent = NULL;
+ struct thread *res = NULL;
+ struct thread_rb_node *nd;
+ bool leftmost = true;
+
+ *created = false;
+ down_write(&table->lock);
+ p = &table->entries.rb_root.rb_node;
+ while (*p != NULL) {
+ struct thread *th;
+
+ parent = *p;
+ th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
+
+ if (thread__tid(th) == tid) {
+ __threads_table_entry__set_last_match(table, th);
+ res = thread__get(th);
+ goto out_unlock;
+ }
+
+ if (tid < thread__tid(th))
+ p = &(*p)->rb_left;
+ else {
+ leftmost = false;
+ p = &(*p)->rb_right;
+ }
+ }
+ nd = malloc(sizeof(*nd));
+ if (nd == NULL)
+ goto out_unlock;
+ res = thread__new(pid, tid);
+ if (!res)
+ free(nd);
+ else {
+ *created = true;
+ nd->thread = thread__get(res);
+ rb_link_node(&nd->rb_node, parent, p);
+ rb_insert_color_cached(&nd->rb_node, &table->entries, leftmost);
+ ++table->nr;
+ __threads_table_entry__set_last_match(table, res);
+ }
+out_unlock:
+ up_write(&table->lock);
+ return res;
+}
+
+void threads__remove_all_threads(struct threads *threads)
+{
+ for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+ struct threads_table_entry *table = &threads->table[i];
+ struct rb_node *nd;
+
+ down_write(&table->lock);
+ __threads_table_entry__set_last_match(table, NULL);
+ nd = rb_first_cached(&table->entries);
+ while (nd) {
+ struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
+
+ nd = rb_next(nd);
+ thread__put(trb->thread);
+ rb_erase_cached(&trb->rb_node, &table->entries);
+ RB_CLEAR_NODE(&trb->rb_node);
+ --table->nr;
+
+ free(trb);
+ }
+ assert(table->nr == 0);
+ up_write(&table->lock);
+ }
+}
+
+void threads__remove(struct threads *threads, struct thread *thread)
+{
+ struct rb_node **p;
+ struct threads_table_entry *table = threads__table(threads, thread__tid(thread));
+ pid_t tid = thread__tid(thread);
+
+ down_write(&table->lock);
+ if (table->last_match && RC_CHK_EQUAL(table->last_match, thread))
+ __threads_table_entry__set_last_match(table, NULL);
+
+ p = &table->entries.rb_root.rb_node;
+ while (*p != NULL) {
+ struct rb_node *parent = *p;
+ struct thread_rb_node *nd = rb_entry(parent, struct thread_rb_node, rb_node);
+ struct thread *th = nd->thread;
+
+ if (RC_CHK_EQUAL(th, thread)) {
+ thread__put(nd->thread);
+ rb_erase_cached(&nd->rb_node, &table->entries);
+ RB_CLEAR_NODE(&nd->rb_node);
+ --table->nr;
+ free(nd);
+ break;
+ }
+
+ if (tid < thread__tid(th))
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ up_write(&table->lock);
+}
+
+int threads__for_each_thread(struct threads *threads,
+ int (*fn)(struct thread *thread, void *data),
+ void *data)
+{
+ for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
+ struct threads_table_entry *table = &threads->table[i];
+ struct rb_node *nd;
+
+ for (nd = rb_first_cached(&table->entries); nd; nd = rb_next(nd)) {
+ struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
+ int rc = fn(trb->thread, data);
+
+ if (rc != 0)
+ return rc;
+ }
+ }
+ return 0;
+
+}
diff --git a/tools/perf/util/threads.h b/tools/perf/util/threads.h
new file mode 100644
index 000000000000..ed67de627578
--- /dev/null
+++ b/tools/perf/util/threads.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_THREADS_H
+#define __PERF_THREADS_H
+
+#include <linux/rbtree.h>
+#include "rwsem.h"
+
+struct thread;
+
+#define THREADS__TABLE_BITS 8
+#define THREADS__TABLE_SIZE (1 << THREADS__TABLE_BITS)
+
+struct threads_table_entry {
+ struct rb_root_cached entries;
+ struct rw_semaphore lock;
+ unsigned int nr;
+ struct thread *last_match;
+};
+
+struct threads {
+ struct threads_table_entry table[THREADS__TABLE_SIZE];
+};
+
+void threads__init(struct threads *threads);
+void threads__exit(struct threads *threads);
+size_t threads__nr(struct threads *threads);
+struct thread *threads__find(struct threads *threads, pid_t tid);
+struct thread *threads__findnew(struct threads *threads, pid_t pid, pid_t tid, bool *created);
+void threads__remove_all_threads(struct threads *threads);
+void threads__remove(struct threads *threads, struct thread *thread);
+int threads__for_each_thread(struct threads *threads,
+ int (*fn)(struct thread *thread, void *data),
+ void *data);
+
+#endif /* __PERF_THREADS_H */
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:30:02

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 22/50] perf bpf: Don't synthesize BPF events when disabled

If BPF sideband events are disabled on the command line, don't
synthesize BPF events too.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/bpf-event.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
index 38fcf3ba5749..830711cae30d 100644
--- a/tools/perf/util/bpf-event.c
+++ b/tools/perf/util/bpf-event.c
@@ -386,6 +386,9 @@ int perf_event__synthesize_bpf_events(struct perf_session *session,
int err;
int fd;

+ if (opts->no_bpf_event)
+ return 0;
+
event = malloc(sizeof(event->bpf) + KSYM_NAME_LEN + machine->id_hdr_size);
if (!event)
return -1;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:30:10

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 21/50] perf pmu: Switch to io_dir__readdir

Avoid DIR allocations when scanning sysfs by using io_dir for the
readdir implementation, that allocates about 1kb on the stack.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/pmu.c | 48 +++++++++++++++++-------------------------
tools/perf/util/pmus.c | 30 ++++++++++----------------
2 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index a967d25e899b..91ae0ce06ef0 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -12,6 +12,7 @@
#include <stdbool.h>
#include <dirent.h>
#include <api/fs/fs.h>
+#include <api/io_dir.h>
#include <locale.h>
#include <fnmatch.h>
#include <math.h>
@@ -184,19 +185,17 @@ static void perf_pmu_format__load(const struct perf_pmu *pmu, struct perf_pmu_fo
*/
int perf_pmu__format_parse(struct perf_pmu *pmu, int dirfd, bool eager_load)
{
- struct dirent *evt_ent;
- DIR *format_dir;
+ struct io_dirent64 *evt_ent;
+ struct io_dir format_dir;
int ret = 0;

- format_dir = fdopendir(dirfd);
- if (!format_dir)
- return -EINVAL;
+ io_dir__init(&format_dir, dirfd);

- while ((evt_ent = readdir(format_dir)) != NULL) {
+ while ((evt_ent = io_dir__readdir(&format_dir)) != NULL) {
struct perf_pmu_format *format;
char *name = evt_ent->d_name;

- if (!strcmp(name, ".") || !strcmp(name, ".."))
+ if (io_dir__is_dir(&format_dir, evt_ent))
continue;

format = perf_pmu__new_format(&pmu->format, name);
@@ -223,7 +222,7 @@ int perf_pmu__format_parse(struct perf_pmu *pmu, int dirfd, bool eager_load)
}
}

- closedir(format_dir);
+ close(format_dir.dirfd);
return ret;
}

@@ -599,8 +598,8 @@ static inline bool pmu_alias_info_file(const char *name)
static int pmu_aliases_parse(struct perf_pmu *pmu)
{
char path[PATH_MAX];
- struct dirent *evt_ent;
- DIR *event_dir;
+ struct io_dirent64 *evt_ent;
+ struct io_dir event_dir;
size_t len;
int fd, dir_fd;

@@ -615,13 +614,9 @@ static int pmu_aliases_parse(struct perf_pmu *pmu)
return 0;
}

- event_dir = fdopendir(dir_fd);
- if (!event_dir){
- close (dir_fd);
- return -EINVAL;
- }
+ io_dir__init(&event_dir, dir_fd);

- while ((evt_ent = readdir(event_dir))) {
+ while ((evt_ent = io_dir__readdir(&event_dir))) {
char *name = evt_ent->d_name;
FILE *file;

@@ -651,7 +646,6 @@ static int pmu_aliases_parse(struct perf_pmu *pmu)
fclose(file);
}

- closedir(event_dir);
close (dir_fd);
pmu->sysfs_aliases_loaded = true;
return 0;
@@ -1879,10 +1873,9 @@ static void perf_pmu__del_caps(struct perf_pmu *pmu)
*/
int perf_pmu__caps_parse(struct perf_pmu *pmu)
{
- struct stat st;
char caps_path[PATH_MAX];
- DIR *caps_dir;
- struct dirent *evt_ent;
+ struct io_dir caps_dir;
+ struct io_dirent64 *evt_ent;
int caps_fd;

if (pmu->caps_initialized)
@@ -1893,24 +1886,21 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
if (!perf_pmu__pathname_scnprintf(caps_path, sizeof(caps_path), pmu->name, "caps"))
return -1;

- if (stat(caps_path, &st) < 0) {
+ caps_fd = open(caps_path, O_CLOEXEC | O_DIRECTORY | O_RDONLY);
+ if (caps_fd == -1) {
pmu->caps_initialized = true;
return 0; /* no error if caps does not exist */
}

- caps_dir = opendir(caps_path);
- if (!caps_dir)
- return -EINVAL;
-
- caps_fd = dirfd(caps_dir);
+ io_dir__init(&caps_dir, caps_fd);

- while ((evt_ent = readdir(caps_dir)) != NULL) {
+ while ((evt_ent = io_dir__readdir(&caps_dir)) != NULL) {
char *name = evt_ent->d_name;
char value[128];
FILE *file;
int fd;

- if (!strcmp(name, ".") || !strcmp(name, ".."))
+ if (io_dir__is_dir(&caps_dir, evt_ent))
continue;

fd = openat(caps_fd, name, O_RDONLY);
@@ -1932,7 +1922,7 @@ int perf_pmu__caps_parse(struct perf_pmu *pmu)
fclose(file);
}

- closedir(caps_dir);
+ close(caps_fd);

pmu->caps_initialized = true;
return pmu->nr_caps;
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index ce4931461741..65b23b98666b 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -3,10 +3,10 @@
#include <linux/list_sort.h>
#include <linux/string.h>
#include <linux/zalloc.h>
+#include <api/io_dir.h>
#include <subcmd/pager.h>
#include <sys/types.h>
#include <ctype.h>
-#include <dirent.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>
@@ -184,8 +184,8 @@ static int pmus_cmp(void *priv __maybe_unused,
static void pmu_read_sysfs(bool core_only)
{
int fd;
- DIR *dir;
- struct dirent *dent;
+ struct io_dir dir;
+ struct io_dirent64 *dent;

if (read_sysfs_all_pmus || (core_only && read_sysfs_core_pmus))
return;
@@ -194,13 +194,9 @@ static void pmu_read_sysfs(bool core_only)
if (fd < 0)
return;

- dir = fdopendir(fd);
- if (!dir) {
- close(fd);
- return;
- }
+ io_dir__init(&dir, fd);

- while ((dent = readdir(dir))) {
+ while ((dent = io_dir__readdir(&dir)) != NULL) {
if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
continue;
if (core_only && !is_pmu_core(dent->d_name))
@@ -209,7 +205,7 @@ static void pmu_read_sysfs(bool core_only)
perf_pmu__find2(fd, dent->d_name);
}

- closedir(dir);
+ close(fd);
if (list_empty(&core_pmus)) {
if (!perf_pmu__create_placeholder_core_pmu(&core_pmus))
pr_err("Failure to set up any core PMUs\n");
@@ -563,8 +559,8 @@ bool perf_pmus__supports_extended_type(void)
char *perf_pmus__default_pmu_name(void)
{
int fd;
- DIR *dir;
- struct dirent *dent;
+ struct io_dir dir;
+ struct io_dirent64 *dent;
char *result = NULL;

if (!list_empty(&core_pmus))
@@ -574,13 +570,9 @@ char *perf_pmus__default_pmu_name(void)
if (fd < 0)
return strdup("cpu");

- dir = fdopendir(fd);
- if (!dir) {
- close(fd);
- return strdup("cpu");
- }
+ io_dir__init(&dir, fd);

- while ((dent = readdir(dir))) {
+ while ((dent = io_dir__readdir(&dir)) != NULL) {
if (!strcmp(dent->d_name, ".") || !strcmp(dent->d_name, ".."))
continue;
if (is_pmu_core(dent->d_name)) {
@@ -589,7 +581,7 @@ char *perf_pmus__default_pmu_name(void)
}
}

- closedir(dir);
+ close(fd);
return result ?: strdup("cpu");
}

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:32:46

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 50/50] perf threads: Reduce table size from 256 to 8

The threads data structure is an array of hashmaps, previously
rbtrees. The two levels allows for a fixed outer array where access is
guarded by rw_semaphores. Commit 91e467bc568f ("perf machine: Use
hashtable for machine threads") sized the outer table at 256 entries
to avoid future scalability problems, however, this means the threads
struct is sized at 30,720 bytes. As the hashmaps allow O(1) access for
the common find/insert/remove operations, lower the number of entries
to 8. This reduces the size overhead to 960 bytes.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/threads.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/threads.h b/tools/perf/util/threads.h
index d03bd91a7769..da68d2223f18 100644
--- a/tools/perf/util/threads.h
+++ b/tools/perf/util/threads.h
@@ -7,7 +7,7 @@

struct thread;

-#define THREADS__TABLE_BITS 8
+#define THREADS__TABLE_BITS 3
#define THREADS__TABLE_SIZE (1 << THREADS__TABLE_BITS)

struct threads_table_entry {
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:36:25

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 23/50] perf header: Switch mem topology to io_dir__readdir

Switch memory_node__read and build_mem_topology from opendir/readdir
to io_dir__readdir, with smaller stack allocations. Reduces peak
memory consumption of perf record by 10kb.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/header.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index e86b9439ffee..55f63d2ee232 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -44,6 +44,7 @@
#include "build-id.h"
#include "data.h"
#include <api/fs/fs.h>
+#include <api/io_dir.h>
#include "asm/bug.h"
#include "tool.h"
#include "time-utils.h"
@@ -1341,11 +1342,11 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
{
unsigned int phys, size = 0;
char path[PATH_MAX];
- struct dirent *ent;
- DIR *dir;
+ struct io_dirent64 *ent;
+ struct io_dir dir;

#define for_each_memory(mem, dir) \
- while ((ent = readdir(dir))) \
+ while ((ent = io_dir__readdir(&dir)) != NULL) \
if (strcmp(ent->d_name, ".") && \
strcmp(ent->d_name, "..") && \
sscanf(ent->d_name, "memory%u", &mem) == 1)
@@ -1354,9 +1355,9 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
"%s/devices/system/node/node%lu",
sysfs__mountpoint(), idx);

- dir = opendir(path);
- if (!dir) {
- pr_warning("failed: can't open memory sysfs data\n");
+ io_dir__init(&dir, open(path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (dir.dirfd < 0) {
+ pr_warning("failed: can't open memory sysfs data '%s'\n", path);
return -1;
}

@@ -1368,20 +1369,20 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)

n->set = bitmap_zalloc(size);
if (!n->set) {
- closedir(dir);
+ close(dir.dirfd);
return -ENOMEM;
}

n->node = idx;
n->size = size;

- rewinddir(dir);
+ io_dir__rewinddir(&dir);

for_each_memory(phys, dir) {
__set_bit(phys, n->set);
}

- closedir(dir);
+ close(dir.dirfd);
return 0;
}

@@ -1404,8 +1405,8 @@ static int memory_node__sort(const void *a, const void *b)
static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
{
char path[PATH_MAX];
- struct dirent *ent;
- DIR *dir;
+ struct io_dirent64 *ent;
+ struct io_dir dir;
int ret = 0;
size_t cnt = 0, size = 0;
struct memory_node *nodes = NULL;
@@ -1413,14 +1414,14 @@ static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
scnprintf(path, PATH_MAX, "%s/devices/system/node/",
sysfs__mountpoint());

- dir = opendir(path);
- if (!dir) {
+ io_dir__init(&dir, open(path, O_CLOEXEC | O_DIRECTORY | O_RDONLY));
+ if (dir.dirfd < 0) {
pr_debug2("%s: couldn't read %s, does this arch have topology information?\n",
__func__, path);
return -1;
}

- while (!ret && (ent = readdir(dir))) {
+ while (!ret && (ent = io_dir__readdir(&dir))) {
unsigned int idx;
int r;

@@ -1447,7 +1448,7 @@ static int build_mem_topology(struct memory_node **nodesp, u64 *cntp)
ret = memory_node__read(&nodes[cnt++], idx);
}
out:
- closedir(dir);
+ close(dir.dirfd);
if (!ret) {
*cntp = cnt;
*nodesp = nodes;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:36:41

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 42/50] perf maps: Hide maps internals

Move the struct into the C file. Add maps__equal to work around
exposing the struct for reference count checking. Add accessors for
the unwind_libunwind_ops. Move maps_list_node to its only use in
symbol.c.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/thread-maps-share.c | 8 +-
tools/perf/util/callchain.c | 2 +-
tools/perf/util/maps.c | 96 +++++++++++++++++++++++
tools/perf/util/maps.h | 97 +++---------------------
tools/perf/util/symbol.c | 10 +++
tools/perf/util/thread.c | 2 +-
tools/perf/util/unwind-libunwind-local.c | 2 +-
tools/perf/util/unwind-libunwind.c | 7 +-
8 files changed, 123 insertions(+), 101 deletions(-)

diff --git a/tools/perf/tests/thread-maps-share.c b/tools/perf/tests/thread-maps-share.c
index 7fa6f7c568e2..e9ecd30a5c05 100644
--- a/tools/perf/tests/thread-maps-share.c
+++ b/tools/perf/tests/thread-maps-share.c
@@ -46,9 +46,9 @@ static int test__thread_maps_share(struct test_suite *test __maybe_unused, int s
TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(maps__refcnt(maps)), 4);

/* test the maps pointer is shared */
- TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t1)));
- TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t2)));
- TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(maps, thread__maps(t3)));
+ TEST_ASSERT_VAL("maps don't match", maps__equal(maps, thread__maps(t1)));
+ TEST_ASSERT_VAL("maps don't match", maps__equal(maps, thread__maps(t2)));
+ TEST_ASSERT_VAL("maps don't match", maps__equal(maps, thread__maps(t3)));

/*
* Verify the other leader was created by previous call.
@@ -73,7 +73,7 @@ static int test__thread_maps_share(struct test_suite *test __maybe_unused, int s
other_maps = thread__maps(other);
TEST_ASSERT_EQUAL("wrong refcnt", refcount_read(maps__refcnt(other_maps)), 2);

- TEST_ASSERT_VAL("maps don't match", RC_CHK_EQUAL(other_maps, thread__maps(other_leader)));
+ TEST_ASSERT_VAL("maps don't match", maps__equal(other_maps, thread__maps(other_leader)));

/* release thread group */
thread__put(t3);
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 99cce43ba152..4878879f0d5c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1157,7 +1157,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
if (al->map == NULL)
goto out;
}
- if (RC_CHK_EQUAL(al->maps, machine__kernel_maps(machine))) {
+ if (maps__equal(al->maps, machine__kernel_maps(machine))) {
if (machine__is_host(machine)) {
al->cpumode = PERF_RECORD_MISC_KERNEL;
al->level = 'k';
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index b3937e734cbf..41e9e39b1b4c 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -6,9 +6,63 @@
#include "dso.h"
#include "map.h"
#include "maps.h"
+#include "rwsem.h"
#include "thread.h"
#include "ui/ui.h"
#include "unwind.h"
+#include <internal/rc_check.h>
+
+/*
+ * Locking/sorting note:
+ *
+ * Sorting is done with the write lock, iteration and binary searching happens
+ * under the read lock requiring being sorted. There is a race between sorting
+ * releasing the write lock and acquiring the read lock for iteration/searching
+ * where another thread could insert and break the sorting of the maps. In
+ * practice inserting maps should be rare meaning that the race shouldn't lead
+ * to live lock. Removal of maps doesn't break being sorted.
+ */
+
+DECLARE_RC_STRUCT(maps) {
+ struct rw_semaphore lock;
+ /**
+ * @maps_by_address: array of maps sorted by their starting address if
+ * maps_by_address_sorted is true.
+ */
+ struct map **maps_by_address;
+ /**
+ * @maps_by_name: optional array of maps sorted by their dso name if
+ * maps_by_name_sorted is true.
+ */
+ struct map **maps_by_name;
+ struct machine *machine;
+#ifdef HAVE_LIBUNWIND_SUPPORT
+ void *addr_space;
+ const struct unwind_libunwind_ops *unwind_libunwind_ops;
+#endif
+ refcount_t refcnt;
+ /**
+ * @nr_maps: number of maps_by_address, and possibly maps_by_name,
+ * entries that contain maps.
+ */
+ unsigned int nr_maps;
+ /**
+ * @nr_maps_allocated: number of entries in maps_by_address and possibly
+ * maps_by_name.
+ */
+ unsigned int nr_maps_allocated;
+ /**
+ * @last_search_by_name_idx: cache of last found by name entry's index
+ * as frequent searches for the same dso name are common.
+ */
+ unsigned int last_search_by_name_idx;
+ /** @maps_by_address_sorted: is maps_by_address sorted. */
+ bool maps_by_address_sorted;
+ /** @maps_by_name_sorted: is maps_by_name sorted. */
+ bool maps_by_name_sorted;
+ /** @ends_broken: does the map contain a map where end values are unset/unsorted? */
+ bool ends_broken;
+};

static void check_invariants(const struct maps *maps __maybe_unused)
{
@@ -103,6 +157,43 @@ static void maps__set_maps_by_name_sorted(struct maps *maps, bool value)
RC_CHK_ACCESS(maps)->maps_by_name_sorted = value;
}

+struct machine *maps__machine(const struct maps *maps)
+{
+ return RC_CHK_ACCESS(maps)->machine;
+}
+
+unsigned int maps__nr_maps(const struct maps *maps)
+{
+ return RC_CHK_ACCESS(maps)->nr_maps;
+}
+
+refcount_t *maps__refcnt(struct maps *maps)
+{
+ return &RC_CHK_ACCESS(maps)->refcnt;
+}
+
+#ifdef HAVE_LIBUNWIND_SUPPORT
+void *maps__addr_space(const struct maps *maps)
+{
+ return RC_CHK_ACCESS(maps)->addr_space;
+}
+
+void maps__set_addr_space(struct maps *maps, void *addr_space)
+{
+ RC_CHK_ACCESS(maps)->addr_space = addr_space;
+}
+
+const struct unwind_libunwind_ops *maps__unwind_libunwind_ops(const struct maps *maps)
+{
+ return RC_CHK_ACCESS(maps)->unwind_libunwind_ops;
+}
+
+void maps__set_unwind_libunwind_ops(struct maps *maps, const struct unwind_libunwind_ops *ops)
+{
+ RC_CHK_ACCESS(maps)->unwind_libunwind_ops = ops;
+}
+#endif
+
static struct rw_semaphore *maps__lock(struct maps *maps)
{
/*
@@ -440,6 +531,11 @@ bool maps__empty(struct maps *maps)
return maps__nr_maps(maps) == 0;
}

+bool maps__equal(struct maps *a, struct maps *b)
+{
+ return RC_CHK_EQUAL(a, b);
+}
+
int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
{
bool done = false;
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index df9dd5a0e3c0..4bcba136ffe5 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -3,80 +3,15 @@
#define __PERF_MAPS_H

#include <linux/refcount.h>
-#include <linux/rbtree.h>
#include <stdio.h>
#include <stdbool.h>
#include <linux/types.h>
-#include "rwsem.h"
-#include <internal/rc_check.h>

struct ref_reloc_sym;
struct machine;
struct map;
struct maps;

-struct map_list_node {
- struct list_head node;
- struct map *map;
-};
-
-static inline struct map_list_node *map_list_node__new(void)
-{
- return malloc(sizeof(struct map_list_node));
-}
-
-/*
- * Locking/sorting note:
- *
- * Sorting is done with the write lock, iteration and binary searching happens
- * under the read lock requiring being sorted. There is a race between sorting
- * releasing the write lock and acquiring the read lock for iteration/searching
- * where another thread could insert and break the sorting of the maps. In
- * practice inserting maps should be rare meaning that the race shouldn't lead
- * to live lock. Removal of maps doesn't break being sorted.
- */
-
-DECLARE_RC_STRUCT(maps) {
- struct rw_semaphore lock;
- /**
- * @maps_by_address: array of maps sorted by their starting address if
- * maps_by_address_sorted is true.
- */
- struct map **maps_by_address;
- /**
- * @maps_by_name: optional array of maps sorted by their dso name if
- * maps_by_name_sorted is true.
- */
- struct map **maps_by_name;
- struct machine *machine;
-#ifdef HAVE_LIBUNWIND_SUPPORT
- void *addr_space;
- const struct unwind_libunwind_ops *unwind_libunwind_ops;
-#endif
- refcount_t refcnt;
- /**
- * @nr_maps: number of maps_by_address, and possibly maps_by_name,
- * entries that contain maps.
- */
- unsigned int nr_maps;
- /**
- * @nr_maps_allocated: number of entries in maps_by_address and possibly
- * maps_by_name.
- */
- unsigned int nr_maps_allocated;
- /**
- * @last_search_by_name_idx: cache of last found by name entry's index
- * as frequent searches for the same dso name are common.
- */
- unsigned int last_search_by_name_idx;
- /** @maps_by_address_sorted: is maps_by_address sorted. */
- bool maps_by_address_sorted;
- /** @maps_by_name_sorted: is maps_by_name sorted. */
- bool maps_by_name_sorted;
- /** @ends_broken: does the map contain a map where end values are unset/unsorted? */
- bool ends_broken;
-};
-
#define KMAP_NAME_LEN 256

struct kmap {
@@ -100,36 +35,22 @@ static inline void __maps__zput(struct maps **map)

#define maps__zput(map) __maps__zput(&map)

+bool maps__equal(struct maps *a, struct maps *b);
+
/* Iterate over map calling cb for each entry. */
int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
/* Iterate over map removing an entry if cb returns true. */
void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data);

-static inline struct machine *maps__machine(struct maps *maps)
-{
- return RC_CHK_ACCESS(maps)->machine;
-}
-
-static inline unsigned int maps__nr_maps(const struct maps *maps)
-{
- return RC_CHK_ACCESS(maps)->nr_maps;
-}
-
-static inline refcount_t *maps__refcnt(struct maps *maps)
-{
- return &RC_CHK_ACCESS(maps)->refcnt;
-}
+struct machine *maps__machine(const struct maps *maps);
+unsigned int maps__nr_maps(const struct maps *maps);
+refcount_t *maps__refcnt(struct maps *maps);

#ifdef HAVE_LIBUNWIND_SUPPORT
-static inline void *maps__addr_space(struct maps *maps)
-{
- return RC_CHK_ACCESS(maps)->addr_space;
-}
-
-static inline const struct unwind_libunwind_ops *maps__unwind_libunwind_ops(const struct maps *maps)
-{
- return RC_CHK_ACCESS(maps)->unwind_libunwind_ops;
-}
+void *maps__addr_space(const struct maps *maps);
+void maps__set_addr_space(struct maps *maps, void *addr_space);
+const struct unwind_libunwind_ops *maps__unwind_libunwind_ops(const struct maps *maps);
+void maps__set_unwind_libunwind_ops(struct maps *maps, const struct unwind_libunwind_ops *ops);
#endif

size_t maps__fprintf(struct maps *maps, FILE *fp);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 3f31e868d883..a264d152a0ef 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -63,6 +63,16 @@ struct symbol_conf symbol_conf = {
.res_sample = 0,
};

+struct map_list_node {
+ struct list_head node;
+ struct map *map;
+};
+
+static struct map_list_node *map_list_node__new(void)
+{
+ return malloc(sizeof(struct map_list_node));
+}
+
static enum dso_binary_type binary_type_symtab[] = {
DSO_BINARY_TYPE__KALLSYMS,
DSO_BINARY_TYPE__GUEST_KALLSYMS,
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 07b158aa3e44..c59ab4d79163 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -383,7 +383,7 @@ static int thread__clone_maps(struct thread *thread, struct thread *parent, bool
if (thread__pid(thread) == thread__pid(parent))
return thread__prepare_access(thread);

- if (RC_CHK_ACCESS(thread__maps(thread)) == RC_CHK_ACCESS(thread__maps(parent))) {
+ if (maps__equal(thread__maps(thread), thread__maps(parent))) {
pr_debug("broken map groups on thread %d/%d parent %d/%d\n",
thread__pid(thread), thread__tid(thread),
thread__pid(parent), thread__tid(parent));
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index 228f1565bd0b..b69dc3a447db 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -706,7 +706,7 @@ static int _unwind__prepare_access(struct maps *maps)
{
void *addr_space = unw_create_addr_space(&accessors, 0);

- RC_CHK_ACCESS(maps)->addr_space = addr_space;
+ maps__set_addr_space(maps, addr_space);
if (!addr_space) {
pr_err("unwind: Can't create unwind address space.\n");
return -ENOMEM;
diff --git a/tools/perf/util/unwind-libunwind.c b/tools/perf/util/unwind-libunwind.c
index 76cd63de80a8..2728eb4f13ea 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -12,11 +12,6 @@ struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops;
struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops;
struct unwind_libunwind_ops __weak *arm64_unwind_libunwind_ops;

-static void unwind__register_ops(struct maps *maps, struct unwind_libunwind_ops *ops)
-{
- RC_CHK_ACCESS(maps)->unwind_libunwind_ops = ops;
-}
-
int unwind__prepare_access(struct maps *maps, struct map *map, bool *initialized)
{
const char *arch;
@@ -60,7 +55,7 @@ int unwind__prepare_access(struct maps *maps, struct map *map, bool *initialized
return 0;
}
out_register:
- unwind__register_ops(maps, ops);
+ maps__set_unwind_libunwind_ops(maps, ops);

err = maps__unwind_libunwind_ops(maps)->prepare_access(maps);
if (initialized)
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:37:27

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 44/50] perf dso: Reorder variables to save space in struct dso

Save 40 bytes and move from 8 to 7 cache lines. Make variable dwfl
dependent on being a powerpc build. Squeeze bits of int/enum types
when appropriate. Remove holes/padding by reordering variables.

Before:
```
struct dso {
struct mutex lock; /* 0 40 */
struct list_head node; /* 40 16 */
struct rb_node rb_node __attribute__((__aligned__(8))); /* 56 24 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
struct rb_root * root; /* 80 8 */
struct rb_root_cached symbols; /* 88 16 */
struct symbol * * symbol_names; /* 104 8 */
size_t symbol_names_len; /* 112 8 */
struct rb_root_cached inlined_nodes; /* 120 16 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct rb_root_cached srclines; /* 136 16 */
struct {
u64 addr; /* 152 8 */
struct symbol * symbol; /* 160 8 */
} last_find_result; /* 152 16 */
void * a2l; /* 168 8 */
char * symsrc_filename; /* 176 8 */
unsigned int a2l_fails; /* 184 4 */
enum dso_space_type kernel; /* 188 4 */
/* --- cacheline 3 boundary (192 bytes) --- */
_Bool is_kmod; /* 192 1 */

/* XXX 3 bytes hole, try to pack */

enum dso_swap_type needs_swap; /* 196 4 */
enum dso_binary_type symtab_type; /* 200 4 */
enum dso_binary_type binary_type; /* 204 4 */
enum dso_load_errno load_errno; /* 208 4 */
u8 adjust_symbols:1; /* 212: 0 1 */
u8 has_build_id:1; /* 212: 1 1 */
u8 header_build_id:1; /* 212: 2 1 */
u8 has_srcline:1; /* 212: 3 1 */
u8 hit:1; /* 212: 4 1 */
u8 annotate_warned:1; /* 212: 5 1 */
u8 auxtrace_warned:1; /* 212: 6 1 */
u8 short_name_allocated:1; /* 212: 7 1 */
u8 long_name_allocated:1; /* 213: 0 1 */
u8 is_64_bit:1; /* 213: 1 1 */

/* XXX 6 bits hole, try to pack */

_Bool sorted_by_name; /* 214 1 */
_Bool loaded; /* 215 1 */
u8 rel; /* 216 1 */

/* XXX 7 bytes hole, try to pack */

struct build_id bid; /* 224 32 */
/* --- cacheline 4 boundary (256 bytes) --- */
u64 text_offset; /* 256 8 */
u64 text_end; /* 264 8 */
const char * short_name; /* 272 8 */
const char * long_name; /* 280 8 */
u16 long_name_len; /* 288 2 */
u16 short_name_len; /* 290 2 */

/* XXX 4 bytes hole, try to pack */

void * dwfl; /* 296 8 */
struct auxtrace_cache * auxtrace_cache; /* 304 8 */
int comp; /* 312 4 */

/* XXX 4 bytes hole, try to pack */

/* --- cacheline 5 boundary (320 bytes) --- */
struct {
struct rb_root cache; /* 320 8 */
int fd; /* 328 4 */
int status; /* 332 4 */
u32 status_seen; /* 336 4 */

/* XXX 4 bytes hole, try to pack */

u64 file_size; /* 344 8 */
struct list_head open_entry; /* 352 16 */
u64 elf_base_addr; /* 368 8 */
u64 debug_frame_offset; /* 376 8 */
/* --- cacheline 6 boundary (384 bytes) --- */
u64 eh_frame_hdr_addr; /* 384 8 */
u64 eh_frame_hdr_offset; /* 392 8 */
} data; /* 320 80 */
struct {
u32 id; /* 400 4 */
u32 sub_id; /* 404 4 */
struct perf_env * env; /* 408 8 */
} bpf_prog; /* 400 16 */
union {
void * priv; /* 416 8 */
u64 db_id; /* 416 8 */
}; /* 416 8 */
struct nsinfo * nsinfo; /* 424 8 */
struct dso_id id; /* 432 24 */
/* --- cacheline 7 boundary (448 bytes) was 8 bytes ago --- */
refcount_t refcnt; /* 456 4 */
char name[]; /* 460 0 */

/* size: 464, cachelines: 8, members: 49 */
/* sum members: 440, holes: 4, sum holes: 18 */
/* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 6 bits */
/* padding: 4 */
/* forced alignments: 1 */
/* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));
```

After:
```
struct dso {
struct mutex lock; /* 0 40 */
struct list_head node; /* 40 16 */
struct rb_node rb_node __attribute__((__aligned__(8))); /* 56 24 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
struct rb_root * root; /* 80 8 */
struct rb_root_cached symbols; /* 88 16 */
struct symbol * * symbol_names; /* 104 8 */
size_t symbol_names_len; /* 112 8 */
struct rb_root_cached inlined_nodes; /* 120 16 */
/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
struct rb_root_cached srclines; /* 136 16 */
struct {
u64 addr; /* 152 8 */
struct symbol * symbol; /* 160 8 */
} last_find_result; /* 152 16 */
struct build_id bid; /* 168 32 */
/* --- cacheline 3 boundary (192 bytes) was 8 bytes ago --- */
u64 text_offset; /* 200 8 */
u64 text_end; /* 208 8 */
const char * short_name; /* 216 8 */
const char * long_name; /* 224 8 */
void * a2l; /* 232 8 */
char * symsrc_filename; /* 240 8 */
struct nsinfo * nsinfo; /* 248 8 */
/* --- cacheline 4 boundary (256 bytes) --- */
struct auxtrace_cache * auxtrace_cache; /* 256 8 */
union {
void * priv; /* 264 8 */
u64 db_id; /* 264 8 */
}; /* 264 8 */
struct {
struct perf_env * env; /* 272 8 */
u32 id; /* 280 4 */
u32 sub_id; /* 284 4 */
} bpf_prog; /* 272 16 */
struct {
struct rb_root cache; /* 288 8 */
struct list_head open_entry; /* 296 16 */
u64 file_size; /* 312 8 */
/* --- cacheline 5 boundary (320 bytes) --- */
u64 elf_base_addr; /* 320 8 */
u64 debug_frame_offset; /* 328 8 */
u64 eh_frame_hdr_addr; /* 336 8 */
u64 eh_frame_hdr_offset; /* 344 8 */
int fd; /* 352 4 */
int status; /* 356 4 */
u32 status_seen; /* 360 4 */
} data; /* 288 80 */

/* XXX last struct has 4 bytes of padding */

struct dso_id id; /* 368 24 */
/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
unsigned int a2l_fails; /* 392 4 */
int comp; /* 396 4 */
refcount_t refcnt; /* 400 4 */
enum dso_load_errno load_errno; /* 404 4 */
u16 long_name_len; /* 408 2 */
u16 short_name_len; /* 410 2 */
enum dso_binary_type symtab_type:8; /* 412: 0 4 */
enum dso_binary_type binary_type:8; /* 412: 8 4 */
enum dso_space_type kernel:2; /* 412:16 4 */
enum dso_swap_type needs_swap:2; /* 412:18 4 */

/* Bitfield combined with next fields */

_Bool is_kmod:1; /* 414: 4 1 */
u8 adjust_symbols:1; /* 414: 5 1 */
u8 has_build_id:1; /* 414: 6 1 */
u8 header_build_id:1; /* 414: 7 1 */
u8 has_srcline:1; /* 415: 0 1 */
u8 hit:1; /* 415: 1 1 */
u8 annotate_warned:1; /* 415: 2 1 */
u8 auxtrace_warned:1; /* 415: 3 1 */
u8 short_name_allocated:1; /* 415: 4 1 */
u8 long_name_allocated:1; /* 415: 5 1 */
u8 is_64_bit:1; /* 415: 6 1 */

/* XXX 1 bit hole, try to pack */

_Bool sorted_by_name; /* 416 1 */
_Bool loaded; /* 417 1 */
u8 rel; /* 418 1 */
char name[]; /* 419 0 */

/* size: 424, cachelines: 7, members: 48 */
/* sum members: 415 */
/* sum bitfield members: 31 bits, bit holes: 1, sum bit holes: 1 bits */
/* padding: 5 */
/* paddings: 1, sum paddings: 4 */
/* forced alignments: 1 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
```

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/dso.h | 84 +++++++++++++++++++++----------------------
1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 3759de8c2267..8bdc17d78b02 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -158,66 +158,66 @@ struct dso {
u64 addr;
struct symbol *symbol;
} last_find_result;
- void *a2l;
- char *symsrc_filename;
- unsigned int a2l_fails;
- enum dso_space_type kernel;
- bool is_kmod;
- enum dso_swap_type needs_swap;
- enum dso_binary_type symtab_type;
- enum dso_binary_type binary_type;
- enum dso_load_errno load_errno;
- u8 adjust_symbols:1;
- u8 has_build_id:1;
- u8 header_build_id:1;
- u8 has_srcline:1;
- u8 hit:1;
- u8 annotate_warned:1;
- u8 auxtrace_warned:1;
- u8 short_name_allocated:1;
- u8 long_name_allocated:1;
- u8 is_64_bit:1;
- bool sorted_by_name;
- bool loaded;
- u8 rel;
struct build_id bid;
u64 text_offset;
u64 text_end;
const char *short_name;
const char *long_name;
- u16 long_name_len;
- u16 short_name_len;
+ void *a2l;
+ char *symsrc_filename;
+#if defined(__powerpc__)
void *dwfl; /* DWARF debug info */
+#endif
+ struct nsinfo *nsinfo;
struct auxtrace_cache *auxtrace_cache;
- int comp;
-
+ union { /* Tool specific area */
+ void *priv;
+ u64 db_id;
+ };
+ /* bpf prog information */
+ struct {
+ struct perf_env *env;
+ u32 id;
+ u32 sub_id;
+ } bpf_prog;
/* dso data file */
struct {
struct rb_root cache;
- int fd;
- int status;
- u32 status_seen;
- u64 file_size;
struct list_head open_entry;
+ u64 file_size;
u64 elf_base_addr;
u64 debug_frame_offset;
u64 eh_frame_hdr_addr;
u64 eh_frame_hdr_offset;
+ int fd;
+ int status;
+ u32 status_seen;
} data;
- /* bpf prog information */
- struct {
- u32 id;
- u32 sub_id;
- struct perf_env *env;
- } bpf_prog;
-
- union { /* Tool specific area */
- void *priv;
- u64 db_id;
- };
- struct nsinfo *nsinfo;
struct dso_id id;
+ unsigned int a2l_fails;
+ int comp;
refcount_t refcnt;
+ enum dso_load_errno load_errno;
+ u16 long_name_len;
+ u16 short_name_len;
+ enum dso_binary_type symtab_type:8;
+ enum dso_binary_type binary_type:8;
+ enum dso_space_type kernel:2;
+ enum dso_swap_type needs_swap:2;
+ bool is_kmod:1;
+ u8 adjust_symbols:1;
+ u8 has_build_id:1;
+ u8 header_build_id:1;
+ u8 has_srcline:1;
+ u8 hit:1;
+ u8 annotate_warned:1;
+ u8 auxtrace_warned:1;
+ u8 short_name_allocated:1;
+ u8 long_name_allocated:1;
+ u8 is_64_bit:1;
+ bool sorted_by_name;
+ bool loaded;
+ u8 rel;
char name[];
};

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:37:33

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 37/50] perf maps: Fix up overlaps during fixup_end

Maps are sometimes made overlapping, in particular kernel maps. If the
end of a map overlaps the start of the next, shorten the overlapping
map. This should remove potential non-determinism in maps__find, ie
finding maps by address.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 01c15d0b300a..fba95a00ecdf 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -700,7 +700,7 @@ void maps__fixup_end(struct maps *maps)
down_write(maps__lock(maps));

maps__for_each_entry(maps, curr) {
- if (prev != NULL && !map__end(prev->map))
+ if (prev && (!map__end(prev->map) || map__end(prev->map) > map__start(curr->map)))
map__set_end(prev->map, map__start(curr->map));

prev = curr;
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:37:34

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 38/50] perf maps: Switch from rbtree to lazily sorted array for addresses

Maps is a collection of maps primarily sorted by the starting address
of the map. Prior to this change the maps were held in an rbtree
requiring 4 pointers per node. Prior to reference count checking, the
rbnode was embedded in the map so 3 pointers per node were
necessary. This change switches the rbtree to an array lazily sorted
by address, much as the array sorting nodes by name. 1 pointer is
needed per node, but to avoid excessive resizing the backing array may
be twice the number of used elements. Meaning the memory overhead is
roughly half that of the rbtree. For a perf record with
"--no-bpf-event -g -a" of true, the memory overhead of perf inject is
reduce fom 3.3MB to 3MB, so 10% or 300KB is saved.

Map inserts always happen at the end of the array. The code tracks
whether the insertion violates the sorting property. O(log n) rb-tree
complexity is switched to O(1).

Remove slides the array, so O(log n) rb-tree complexity is degraded to
O(n).

A find may need to sort the array using qsort which is O(n*log n), but
in general the maps should be sorted and so average performance should
be O(log n) as with the rbtree.

An rbtree node consumes a cache line, but with the array 4 nodes fit
on a cache line. Iteration is simplified to scanning an array rather
than pointer chasing.

Overall it is expected the performance after the change should be
comparable to before, but with half of the memory consumed.

To avoid a list and repeated logic around splitting maps,
maps__merge_in is rewritten in terms of
maps__fixup_overlap_and_insert. maps_merge_in splits the given mapping
inserting remaining gaps. maps__fixup_overlap_and_insert splits the
existing mappings, then adds the incoming mapping. By adding the new
mapping first, then re-inserting the existing mappings the splitting
behavior matches.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/tests/maps.c | 3 +
tools/perf/util/map.c | 1 +
tools/perf/util/maps.c | 1183 +++++++++++++++++++++++----------------
tools/perf/util/maps.h | 54 +-
4 files changed, 757 insertions(+), 484 deletions(-)

diff --git a/tools/perf/tests/maps.c b/tools/perf/tests/maps.c
index bb3fbfe5a73e..b15417a0d617 100644
--- a/tools/perf/tests/maps.c
+++ b/tools/perf/tests/maps.c
@@ -156,6 +156,9 @@ static int test__maps__merge_in(struct test_suite *t __maybe_unused, int subtest
TEST_ASSERT_VAL("merge check failed", !ret);

maps__zput(maps);
+ map__zput(map_kcore1);
+ map__zput(map_kcore2);
+ map__zput(map_kcore3);
return TEST_OK;
}

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 54c67cb7ecef..cf5a15db3a1f 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -168,6 +168,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
if (dso == NULL)
goto out_delete;

+ assert(!dso->kernel);
map__init(result, start, start + len, pgoff, dso);

if (anon || no_dso) {
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index fba95a00ecdf..06fdd8a7c2a2 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -10,286 +10,477 @@
#include "ui/ui.h"
#include "unwind.h"

-struct map_rb_node {
- struct rb_node rb_node;
- struct map *map;
-};
-
-#define maps__for_each_entry(maps, map) \
- for (map = maps__first(maps); map; map = map_rb_node__next(map))
+static void check_invariants(const struct maps *maps __maybe_unused)
+{
+#ifndef NDEBUG
+ assert(RC_CHK_ACCESS(maps)->nr_maps <= RC_CHK_ACCESS(maps)->nr_maps_allocated);
+ for (unsigned int i = 0; i < RC_CHK_ACCESS(maps)->nr_maps; i++) {
+ struct map *map = RC_CHK_ACCESS(maps)->maps_by_address[i];
+
+ /* Check map is well-formed. */
+ assert(map__end(map) == 0 || map__start(map) <= map__end(map));
+ /* Expect at least 1 reference count. */
+ assert(refcount_read(map__refcnt(map)) > 0);
+
+ if (map__dso(map) && map__dso(map)->kernel)
+ assert(RC_CHK_EQUAL(map__kmap(map)->kmaps, maps));
+
+ if (i > 0) {
+ struct map *prev = RC_CHK_ACCESS(maps)->maps_by_address[i - 1];
+
+ /* If addresses are sorted... */
+ if (RC_CHK_ACCESS(maps)->maps_by_address_sorted) {
+ /* Maps should be in start address order. */
+ assert(map__start(prev) <= map__start(map));
+ /*
+ * If the ends of maps aren't broken (during
+ * construction) then they should be ordered
+ * too.
+ */
+ if (!RC_CHK_ACCESS(maps)->ends_broken) {
+ assert(map__end(prev) <= map__end(map));
+ assert(map__end(prev) <= map__start(map) ||
+ map__start(prev) == map__start(map));
+ }
+ }
+ }
+ }
+ if (RC_CHK_ACCESS(maps)->maps_by_name) {
+ for (unsigned int i = 0; i < RC_CHK_ACCESS(maps)->nr_maps; i++) {
+ struct map *map = RC_CHK_ACCESS(maps)->maps_by_name[i];

-#define maps__for_each_entry_safe(maps, map, next) \
- for (map = maps__first(maps), next = map_rb_node__next(map); map; \
- map = next, next = map_rb_node__next(map))
+ /*
+ * Maps by name maps should be in maps_by_address, so
+ * the reference count should be higher.
+ */
+ assert(refcount_read(map__refcnt(map)) > 1);
+ }
+ }
+#endif
+}

-static struct rb_root *maps__entries(struct maps *maps)
+static struct map **maps__maps_by_address(const struct maps *maps)
{
- return &RC_CHK_ACCESS(maps)->entries;
+ return RC_CHK_ACCESS(maps)->maps_by_address;
}

-static struct rw_semaphore *maps__lock(struct maps *maps)
+static void maps__set_maps_by_address(struct maps *maps, struct map **new)
{
- return &RC_CHK_ACCESS(maps)->lock;
+ RC_CHK_ACCESS(maps)->maps_by_address = new;
+
}

-static struct map **maps__maps_by_name(struct maps *maps)
+/* Not in the header, to aid reference counting. */
+static struct map **maps__maps_by_name(const struct maps *maps)
{
return RC_CHK_ACCESS(maps)->maps_by_name;
+
}

-static struct map_rb_node *maps__first(struct maps *maps)
+static void maps__set_maps_by_name(struct maps *maps, struct map **new)
{
- struct rb_node *first = rb_first(maps__entries(maps));
+ RC_CHK_ACCESS(maps)->maps_by_name = new;

- if (first)
- return rb_entry(first, struct map_rb_node, rb_node);
- return NULL;
}

-static struct map_rb_node *map_rb_node__next(struct map_rb_node *node)
+static bool maps__maps_by_address_sorted(const struct maps *maps)
{
- struct rb_node *next;
-
- if (!node)
- return NULL;
-
- next = rb_next(&node->rb_node);
+ return RC_CHK_ACCESS(maps)->maps_by_address_sorted;
+}

- if (!next)
- return NULL;
+static void maps__set_maps_by_address_sorted(struct maps *maps, bool value)
+{
+ RC_CHK_ACCESS(maps)->maps_by_address_sorted = value;
+}

- return rb_entry(next, struct map_rb_node, rb_node);
+static bool maps__maps_by_name_sorted(const struct maps *maps)
+{
+ return RC_CHK_ACCESS(maps)->maps_by_name_sorted;
}

-static struct map_rb_node *maps__find_node(struct maps *maps, struct map *map)
+static void maps__set_maps_by_name_sorted(struct maps *maps, bool value)
{
- struct map_rb_node *rb_node;
+ RC_CHK_ACCESS(maps)->maps_by_name_sorted = value;
+}

- maps__for_each_entry(maps, rb_node) {
- if (rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map))
- return rb_node;
- }
- return NULL;
+static struct rw_semaphore *maps__lock(struct maps *maps)
+{
+ /*
+ * When the lock is acquired or released the maps invariants should
+ * hold.
+ */
+ check_invariants(maps);
+ return &RC_CHK_ACCESS(maps)->lock;
}

static void maps__init(struct maps *maps, struct machine *machine)
{
- refcount_set(maps__refcnt(maps), 1);
init_rwsem(maps__lock(maps));
- RC_CHK_ACCESS(maps)->entries = RB_ROOT;
+ RC_CHK_ACCESS(maps)->maps_by_address = NULL;
+ RC_CHK_ACCESS(maps)->maps_by_name = NULL;
RC_CHK_ACCESS(maps)->machine = machine;
- RC_CHK_ACCESS(maps)->last_search_by_name = NULL;
+#ifdef HAVE_LIBUNWIND_SUPPORT
+ RC_CHK_ACCESS(maps)->addr_space = NULL;
+ RC_CHK_ACCESS(maps)->unwind_libunwind_ops = NULL;
+#endif
+ refcount_set(maps__refcnt(maps), 1);
RC_CHK_ACCESS(maps)->nr_maps = 0;
- RC_CHK_ACCESS(maps)->maps_by_name = NULL;
+ RC_CHK_ACCESS(maps)->nr_maps_allocated = 0;
+ RC_CHK_ACCESS(maps)->last_search_by_name_idx = 0;
+ RC_CHK_ACCESS(maps)->maps_by_address_sorted = true;
+ RC_CHK_ACCESS(maps)->maps_by_name_sorted = false;
}

-static void __maps__free_maps_by_name(struct maps *maps)
+static void maps__exit(struct maps *maps)
{
- /*
- * Free everything to try to do it from the rbtree in the next search
- */
- for (unsigned int i = 0; i < maps__nr_maps(maps); i++)
- map__put(maps__maps_by_name(maps)[i]);
+ struct map **maps_by_address = maps__maps_by_address(maps);
+ struct map **maps_by_name = maps__maps_by_name(maps);

- zfree(&RC_CHK_ACCESS(maps)->maps_by_name);
- RC_CHK_ACCESS(maps)->nr_maps_allocated = 0;
+ for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
+ map__zput(maps_by_address[i]);
+ if (maps_by_name)
+ map__zput(maps_by_name[i]);
+ }
+ zfree(&maps_by_address);
+ zfree(&maps_by_name);
+ unwind__finish_access(maps);
}

-static int __maps__insert(struct maps *maps, struct map *map)
+struct maps *maps__new(struct machine *machine)
{
- struct rb_node **p = &maps__entries(maps)->rb_node;
- struct rb_node *parent = NULL;
- const u64 ip = map__start(map);
- struct map_rb_node *m, *new_rb_node;
-
- new_rb_node = malloc(sizeof(*new_rb_node));
- if (!new_rb_node)
- return -ENOMEM;
-
- RB_CLEAR_NODE(&new_rb_node->rb_node);
- new_rb_node->map = map__get(map);
+ struct maps *result;
+ RC_STRUCT(maps) *maps = zalloc(sizeof(*maps));

- while (*p != NULL) {
- parent = *p;
- m = rb_entry(parent, struct map_rb_node, rb_node);
- if (ip < map__start(m->map))
- p = &(*p)->rb_left;
- else
- p = &(*p)->rb_right;
- }
+ if (ADD_RC_CHK(result, maps))
+ maps__init(result, machine);

- rb_link_node(&new_rb_node->rb_node, parent, p);
- rb_insert_color(&new_rb_node->rb_node, maps__entries(maps));
- return 0;
+ return result;
}

-int maps__insert(struct maps *maps, struct map *map)
+static void maps__delete(struct maps *maps)
{
- int err;
- const struct dso *dso = map__dso(map);
-
- down_write(maps__lock(maps));
- err = __maps__insert(maps, map);
- if (err)
- goto out;
+ maps__exit(maps);
+ RC_CHK_FREE(maps);
+}

- ++RC_CHK_ACCESS(maps)->nr_maps;
+struct maps *maps__get(struct maps *maps)
+{
+ struct maps *result;

- if (dso && dso->kernel) {
- struct kmap *kmap = map__kmap(map);
+ if (RC_CHK_GET(result, maps))
+ refcount_inc(maps__refcnt(maps));

- if (kmap)
- kmap->kmaps = maps;
- else
- pr_err("Internal error: kernel dso with non kernel map\n");
- }
+ return result;
+}

+void maps__put(struct maps *maps)
+{
+ if (maps && refcount_dec_and_test(maps__refcnt(maps)))
+ maps__delete(maps);
+ else
+ RC_CHK_PUT(maps);
+}

+static void __maps__free_maps_by_name(struct maps *maps)
+{
/*
- * If we already performed some search by name, then we need to add the just
- * inserted map and resort.
+ * Free everything to try to do it from the rbtree in the next search
*/
- if (maps__maps_by_name(maps)) {
- if (maps__nr_maps(maps) > RC_CHK_ACCESS(maps)->nr_maps_allocated) {
- int nr_allocate = maps__nr_maps(maps) * 2;
- struct map **maps_by_name = realloc(maps__maps_by_name(maps),
- nr_allocate * sizeof(map));
+ for (unsigned int i = 0; i < maps__nr_maps(maps); i++)
+ map__put(maps__maps_by_name(maps)[i]);

- if (maps_by_name == NULL) {
- __maps__free_maps_by_name(maps);
- err = -ENOMEM;
- goto out;
- }
+ zfree(&RC_CHK_ACCESS(maps)->maps_by_name);
+}

- RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
- RC_CHK_ACCESS(maps)->nr_maps_allocated = nr_allocate;
+static int map__start_cmp(const void *a, const void *b)
+{
+ const struct map *map_a = *(const struct map * const *)a;
+ const struct map *map_b = *(const struct map * const *)b;
+ u64 map_a_start = map__start(map_a);
+ u64 map_b_start = map__start(map_b);
+
+ if (map_a_start == map_b_start) {
+ u64 map_a_end = map__end(map_a);
+ u64 map_b_end = map__end(map_b);
+
+ if (map_a_end == map_b_end) {
+ /* Ensure maps with the same addresses have a fixed order. */
+ if (RC_CHK_ACCESS(map_a) == RC_CHK_ACCESS(map_b))
+ return 0;
+ return (intptr_t)RC_CHK_ACCESS(map_a) > (intptr_t)RC_CHK_ACCESS(map_b)
+ ? 1 : -1;
}
- maps__maps_by_name(maps)[maps__nr_maps(maps) - 1] = map__get(map);
- __maps__sort_by_name(maps);
+ return map_a_end > map_b_end ? 1 : -1;
}
- out:
- up_write(maps__lock(maps));
- return err;
+ return map_a_start > map_b_start ? 1 : -1;
}

-static void __maps__remove(struct maps *maps, struct map_rb_node *rb_node)
+static void __maps__sort_by_address(struct maps *maps)
{
- rb_erase_init(&rb_node->rb_node, maps__entries(maps));
- map__put(rb_node->map);
- free(rb_node);
+ if (maps__maps_by_address_sorted(maps))
+ return;
+
+ qsort(maps__maps_by_address(maps),
+ maps__nr_maps(maps),
+ sizeof(struct map *),
+ map__start_cmp);
+ maps__set_maps_by_address_sorted(maps, true);
}

-void maps__remove(struct maps *maps, struct map *map)
+static void maps__sort_by_address(struct maps *maps)
{
- struct map_rb_node *rb_node;
-
down_write(maps__lock(maps));
- if (RC_CHK_ACCESS(maps)->last_search_by_name == map)
- RC_CHK_ACCESS(maps)->last_search_by_name = NULL;
-
- rb_node = maps__find_node(maps, map);
- assert(rb_node->RC_CHK_ACCESS(map) == RC_CHK_ACCESS(map));
- __maps__remove(maps, rb_node);
- if (maps__maps_by_name(maps))
- __maps__free_maps_by_name(maps);
- --RC_CHK_ACCESS(maps)->nr_maps;
+ __maps__sort_by_address(maps);
up_write(maps__lock(maps));
}

-static void __maps__purge(struct maps *maps)
+static int map__strcmp(const void *a, const void *b)
{
- struct map_rb_node *pos, *next;
-
- if (maps__maps_by_name(maps))
- __maps__free_maps_by_name(maps);
+ const struct map *map_a = *(const struct map * const *)a;
+ const struct map *map_b = *(const struct map * const *)b;
+ const struct dso *dso_a = map__dso(map_a);
+ const struct dso *dso_b = map__dso(map_b);
+ int ret = strcmp(dso_a->short_name, dso_b->short_name);

- maps__for_each_entry_safe(maps, pos, next) {
- rb_erase_init(&pos->rb_node, maps__entries(maps));
- map__put(pos->map);
- free(pos);
+ if (ret == 0 && RC_CHK_ACCESS(map_a) != RC_CHK_ACCESS(map_b)) {
+ /* Ensure distinct but name equal maps have an order. */
+ return map__start_cmp(a, b);
}
+ return ret;
}

-static void maps__exit(struct maps *maps)
+static int maps__sort_by_name(struct maps *maps)
{
+ int err = 0;
down_write(maps__lock(maps));
- __maps__purge(maps);
+ if (!maps__maps_by_name_sorted(maps)) {
+ struct map **maps_by_name = maps__maps_by_name(maps);
+
+ if (!maps_by_name) {
+ maps_by_name = malloc(RC_CHK_ACCESS(maps)->nr_maps_allocated *
+ sizeof(*maps_by_name));
+ if (!maps_by_name)
+ err = -ENOMEM;
+ else {
+ struct map **maps_by_address = maps__maps_by_address(maps);
+ unsigned int n = maps__nr_maps(maps);
+
+ maps__set_maps_by_name(maps, maps_by_name);
+ for (unsigned int i = 0; i < n; i++)
+ maps_by_name[i] = map__get(maps_by_address[i]);
+ }
+ }
+ if (!err) {
+ qsort(maps_by_name,
+ maps__nr_maps(maps),
+ sizeof(struct map *),
+ map__strcmp);
+ maps__set_maps_by_name_sorted(maps, true);
+ }
+ }
up_write(maps__lock(maps));
+ return err;
}

-bool maps__empty(struct maps *maps)
+static unsigned int maps__by_address_index(const struct maps *maps, const struct map *map)
{
- return !maps__first(maps);
+ struct map **maps_by_address = maps__maps_by_address(maps);
+
+ if (maps__maps_by_address_sorted(maps)) {
+ struct map **mapp =
+ bsearch(&map, maps__maps_by_address(maps), maps__nr_maps(maps),
+ sizeof(*mapp), map__start_cmp);
+
+ if (mapp)
+ return mapp - maps_by_address;
+ } else {
+ for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
+ if (RC_CHK_ACCESS(maps_by_address[i]) == RC_CHK_ACCESS(map))
+ return i;
+ }
+ }
+ pr_err("Map missing from maps");
+ return -1;
}

-struct maps *maps__new(struct machine *machine)
+static unsigned int maps__by_name_index(const struct maps *maps, const struct map *map)
{
- struct maps *result;
- RC_STRUCT(maps) *maps = zalloc(sizeof(*maps));
+ struct map **maps_by_name = maps__maps_by_name(maps);
+
+ if (maps__maps_by_name_sorted(maps)) {
+ struct map **mapp =
+ bsearch(&map, maps_by_name, maps__nr_maps(maps),
+ sizeof(*mapp), map__strcmp);
+
+ if (mapp)
+ return mapp - maps_by_name;
+ } else {
+ for (unsigned int i = 0; i < maps__nr_maps(maps); i++) {
+ if (RC_CHK_ACCESS(maps_by_name[i]) == RC_CHK_ACCESS(map))
+ return i;
+ }
+ }
+ pr_err("Map missing from maps");
+ return -1;
+}

- if (ADD_RC_CHK(result, maps))
- maps__init(result, machine);
+static int __maps__insert(struct maps *maps, struct map *new)
+{
+ struct map **maps_by_address = maps__maps_by_address(maps);
+ struct map **maps_by_name = maps__maps_by_name(maps);
+ const struct dso *dso = map__dso(new);
+ unsigned int nr_maps = maps__nr_maps(maps);
+ unsigned int nr_allocate = RC_CHK_ACCESS(maps)->nr_maps_allocated;
+
+ if (nr_maps + 1 > nr_allocate) {
+ nr_allocate = !nr_allocate ? 32 : nr_allocate * 2;
+
+ maps_by_address = realloc(maps_by_address, nr_allocate * sizeof(new));
+ if (!maps_by_address)
+ return -ENOMEM;
+
+ maps__set_maps_by_address(maps, maps_by_address);
+ if (maps_by_name) {
+ maps_by_name = realloc(maps_by_name, nr_allocate * sizeof(new));
+ if (!maps_by_name) {
+ /*
+ * If by name fails, just disable by name and it will
+ * recompute next time it is required.
+ */
+ __maps__free_maps_by_name(maps);
+ }
+ maps__set_maps_by_name(maps, maps_by_name);
+ }
+ RC_CHK_ACCESS(maps)->nr_maps_allocated = nr_allocate;
+ }
+ /* Insert the value at the end. */
+ maps_by_address[nr_maps] = map__get(new);
+ if (maps_by_name)
+ maps_by_name[nr_maps] = map__get(new);

- return result;
+ nr_maps++;
+ RC_CHK_ACCESS(maps)->nr_maps = nr_maps;
+
+ /*
+ * Recompute if things are sorted. If things are inserted in a sorted
+ * manner, for example by processing /proc/pid/maps, then no
+ * sorting/resorting will be necessary.
+ */
+ if (nr_maps == 1) {
+ /* If there's just 1 entry then maps are sorted. */
+ maps__set_maps_by_address_sorted(maps, true);
+ maps__set_maps_by_name_sorted(maps, maps_by_name != NULL);
+ } else {
+ /* Sorted if maps were already sorted and this map starts after the last one. */
+ maps__set_maps_by_address_sorted(maps,
+ maps__maps_by_address_sorted(maps) &&
+ map__end(maps_by_address[nr_maps - 2]) <= map__start(new));
+ maps__set_maps_by_name_sorted(maps, false);
+ }
+ if (map__end(new) < map__start(new))
+ RC_CHK_ACCESS(maps)->ends_broken = true;
+ if (dso && dso->kernel) {
+ struct kmap *kmap = map__kmap(new);
+
+ if (kmap)
+ kmap->kmaps = maps;
+ else
+ pr_err("Internal error: kernel dso with non kernel map\n");
+ }
+ check_invariants(maps);
+ return 0;
}

-static void maps__delete(struct maps *maps)
+int maps__insert(struct maps *maps, struct map *map)
{
- maps__exit(maps);
- unwind__finish_access(maps);
- RC_CHK_FREE(maps);
+ int ret;
+
+ down_write(maps__lock(maps));
+ ret = __maps__insert(maps, map);
+ up_write(maps__lock(maps));
+ return ret;
}

-struct maps *maps__get(struct maps *maps)
+static void __maps__remove(struct maps *maps, struct map *map)
{
- struct maps *result;
+ struct map **maps_by_address = maps__maps_by_address(maps);
+ struct map **maps_by_name = maps__maps_by_name(maps);
+ unsigned int nr_maps = maps__nr_maps(maps);
+ unsigned int address_idx;
+
+ /* Slide later mappings over the one to remove */
+ address_idx = maps__by_address_index(maps, map);
+ map__put(maps_by_address[address_idx]);
+ memmove(&maps_by_address[address_idx],
+ &maps_by_address[address_idx + 1],
+ (nr_maps - address_idx - 1) * sizeof(*maps_by_address));
+
+ if (maps_by_name) {
+ unsigned int name_idx = maps__by_name_index(maps, map);
+
+ map__put(maps_by_name[name_idx]);
+ memmove(&maps_by_name[name_idx],
+ &maps_by_name[name_idx + 1],
+ (nr_maps - name_idx - 1) * sizeof(*maps_by_name));
+ }

- if (RC_CHK_GET(result, maps))
- refcount_inc(maps__refcnt(maps));
+ --RC_CHK_ACCESS(maps)->nr_maps;
+ check_invariants(maps);
+}

- return result;
+void maps__remove(struct maps *maps, struct map *map)
+{
+ down_write(maps__lock(maps));
+ __maps__remove(maps, map);
+ up_write(maps__lock(maps));
}

-void maps__put(struct maps *maps)
+bool maps__empty(struct maps *maps)
{
- if (maps && refcount_dec_and_test(maps__refcnt(maps)))
- maps__delete(maps);
- else
- RC_CHK_PUT(maps);
+ return maps__nr_maps(maps) == 0;
}

int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
{
- struct map_rb_node *pos;
+ bool done = false;
int ret = 0;

- down_read(maps__lock(maps));
- maps__for_each_entry(maps, pos) {
- ret = cb(pos->map, data);
- if (ret)
- break;
+ /* See locking/sorting note. */
+ while (!done) {
+ down_read(maps__lock(maps));
+ if (maps__maps_by_address_sorted(maps)) {
+ struct map **maps_by_address = maps__maps_by_address(maps);
+ unsigned int n = maps__nr_maps(maps);
+
+ for (unsigned int i = 0; i < n; i++) {
+ struct map *map = maps_by_address[i];
+
+ ret = cb(map, data);
+ if (ret)
+ break;
+ }
+ done = true;
+ }
+ up_read(maps__lock(maps));
+ if (!done)
+ maps__sort_by_address(maps);
}
- up_read(maps__lock(maps));
return ret;
}

void maps__remove_maps(struct maps *maps, bool (*cb)(struct map *map, void *data), void *data)
{
- struct map_rb_node *pos, *next;
- unsigned int start_nr_maps;
+ struct map **maps_by_address;

down_write(maps__lock(maps));

- start_nr_maps = maps__nr_maps(maps);
- maps__for_each_entry_safe(maps, pos, next) {
- if (cb(pos->map, data)) {
- __maps__remove(maps, pos);
- --RC_CHK_ACCESS(maps)->nr_maps;
- }
+ maps_by_address = maps__maps_by_address(maps);
+ for (unsigned int i = 0; i < maps__nr_maps(maps);) {
+ if (cb(maps_by_address[i], data))
+ __maps__remove(maps, maps_by_address[i]);
+ else
+ i++;
}
- if (maps__maps_by_name(maps) && start_nr_maps != maps__nr_maps(maps))
- __maps__free_maps_by_name(maps);
-
up_write(maps__lock(maps));
}

@@ -300,7 +491,7 @@ struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
/* Ensure map is loaded before using map->map_ip */
if (map != NULL && map__load(map) >= 0) {
if (mapp != NULL)
- *mapp = map;
+ *mapp = map; // TODO: map_put on else path when find returns a get.
return map__find_symbol(map, map__map_ip(map, addr));
}

@@ -348,7 +539,7 @@ int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
if (ams->addr < map__start(ams->ms.map) || ams->addr >= map__end(ams->ms.map)) {
if (maps == NULL)
return -1;
- ams->ms.map = maps__find(maps, ams->addr);
+ ams->ms.map = maps__find(maps, ams->addr); // TODO: map_get
if (ams->ms.map == NULL)
return -1;
}
@@ -393,24 +584,28 @@ size_t maps__fprintf(struct maps *maps, FILE *fp)
* Find first map where end > new->start.
* Same as find_vma() in kernel.
*/
-static struct rb_node *first_ending_after(struct maps *maps, const struct map *map)
+static unsigned int first_ending_after(struct maps *maps, const struct map *map)
{
- struct rb_root *root;
- struct rb_node *next, *first;
+ struct map **maps_by_address = maps__maps_by_address(maps);
+ int low = 0, high = (int)maps__nr_maps(maps) - 1, first = high + 1;
+
+ assert(maps__maps_by_address_sorted(maps));
+ if (low <= high && map__end(maps_by_address[0]) > map__start(map))
+ return 0;

- root = maps__entries(maps);
- next = root->rb_node;
- first = NULL;
- while (next) {
- struct map_rb_node *pos = rb_entry(next, struct map_rb_node, rb_node);
+ while (low <= high) {
+ int mid = (low + high) / 2;
+ struct map *pos = maps_by_address[mid];

- if (map__end(pos->map) > map__start(map)) {
- first = next;
- if (map__start(pos->map) <= map__start(map))
+ if (map__end(pos) > map__start(map)) {
+ first = mid;
+ if (map__start(pos) <= map__start(map)) {
+ /* Entry overlaps map. */
break;
- next = next->rb_left;
+ }
+ high = mid - 1;
} else
- next = next->rb_right;
+ low = mid + 1;
}
return first;
}
@@ -419,170 +614,248 @@ static struct rb_node *first_ending_after(struct maps *maps, const struct map *m
* Adds new to maps, if new overlaps existing entries then the existing maps are
* adjusted or removed so that new fits without overlapping any entries.
*/
-int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
+static int __maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
{
-
- struct rb_node *next;
+ struct map **maps_by_address;
int err = 0;

- down_write(maps__lock(maps));
+sort_again:
+ if (!maps__maps_by_address_sorted(maps))
+ __maps__sort_by_address(maps);

- next = first_ending_after(maps, new);
- while (next && !err) {
- struct map_rb_node *pos = rb_entry(next, struct map_rb_node, rb_node);
- next = rb_next(&pos->rb_node);
+ maps_by_address = maps__maps_by_address(maps);
+ /*
+ * Iterate through entries where the end of the existing entry is
+ * greater-than the new map's start.
+ */
+ for (unsigned int i = first_ending_after(maps, new); i < maps__nr_maps(maps); ) {
+ struct map *pos = maps_by_address[i];
+ struct map *before = NULL, *after = NULL;

/*
* Stop if current map starts after map->end.
* Maps are ordered by start: next will not overlap for sure.
*/
- if (map__start(pos->map) >= map__end(new))
+ if (map__start(pos) >= map__end(new))
break;

- if (verbose >= 2) {
-
- if (use_browser) {
- pr_debug("overlapping maps in %s (disable tui for more info)\n",
- map__dso(new)->name);
- } else {
- pr_debug("overlapping maps:\n");
- map__fprintf(new, debug_file());
- map__fprintf(pos->map, debug_file());
- }
+ if (use_browser) {
+ pr_debug("overlapping maps in %s (disable tui for more info)\n",
+ map__dso(new)->name);
+ } else if (verbose >= 2) {
+ pr_debug("overlapping maps:\n");
+ map__fprintf(new, debug_file());
+ map__fprintf(pos, debug_file());
}

- rb_erase_init(&pos->rb_node, maps__entries(maps));
/*
* Now check if we need to create new maps for areas not
* overlapped by the new map:
*/
- if (map__start(new) > map__start(pos->map)) {
- struct map *before = map__clone(pos->map);
+ if (map__start(new) > map__start(pos)) {
+ /* Map starts within existing map. Need to shorten the existing map. */
+ before = map__clone(pos);

if (before == NULL) {
err = -ENOMEM;
- goto put_map;
+ goto out_err;
}
-
map__set_end(before, map__start(new));
- err = __maps__insert(maps, before);
- if (err) {
- map__put(before);
- goto put_map;
- }

if (verbose >= 2 && !use_browser)
map__fprintf(before, debug_file());
- map__put(before);
}
-
- if (map__end(new) < map__end(pos->map)) {
- struct map *after = map__clone(pos->map);
+ if (map__end(new) < map__end(pos)) {
+ /* The new map isn't as long as the existing map. */
+ after = map__clone(pos);

if (after == NULL) {
+ map__zput(before);
err = -ENOMEM;
- goto put_map;
+ goto out_err;
}

map__set_start(after, map__end(new));
- map__add_pgoff(after, map__end(new) - map__start(pos->map));
- assert(map__map_ip(pos->map, map__end(new)) ==
- map__map_ip(after, map__end(new)));
- err = __maps__insert(maps, after);
- if (err) {
- map__put(after);
- goto put_map;
- }
+ map__add_pgoff(after, map__end(new) - map__start(pos));
+ assert(map__map_ip(pos, map__end(new)) ==
+ map__map_ip(after, map__end(new)));
+
if (verbose >= 2 && !use_browser)
map__fprintf(after, debug_file());
- map__put(after);
}
-put_map:
- map__put(pos->map);
- free(pos);
+ /*
+ * If adding one entry, for `before` or `after`, we can replace
+ * the existing entry. If both `before` and `after` are
+ * necessary than an insert is needed. If the existing entry
+ * entirely overlaps the existing entry it can just be removed.
+ */
+ if (before) {
+ map__put(maps_by_address[i]);
+ maps_by_address[i] = before;
+ /* Maps are still ordered, go to next one. */
+ i++;
+ if (after) {
+ __maps__insert(maps, after);
+ map__put(after);
+ if (!maps__maps_by_address_sorted(maps)) {
+ /*
+ * Sorting broken so invariants don't
+ * hold, sort and go again.
+ */
+ goto sort_again;
+ }
+ /*
+ * Maps are still ordered, skip after and go to
+ * next one (terminate loop).
+ */
+ i++;
+ }
+ } else if (after) {
+ map__put(maps_by_address[i]);
+ maps_by_address[i] = after;
+ /* Maps are ordered, go to next one. */
+ i++;
+ } else {
+ __maps__remove(maps, pos);
+ /*
+ * Maps are ordered but no need to increase `i` as the
+ * later maps were moved down.
+ */
+ }
+ check_invariants(maps);
}
/* Add the map. */
- err = __maps__insert(maps, new);
- up_write(maps__lock(maps));
+ __maps__insert(maps, new);
+out_err:
return err;
}

-int maps__copy_from(struct maps *maps, struct maps *parent)
+int maps__fixup_overlap_and_insert(struct maps *maps, struct map *new)
{
int err;
- struct map_rb_node *rb_node;

+ down_write(maps__lock(maps));
+ err = __maps__fixup_overlap_and_insert(maps, new);
+ up_write(maps__lock(maps));
+ return err;
+}
+
+int maps__copy_from(struct maps *dest, struct maps *parent)
+{
+ /* Note, if struct map were immutable then cloning could use ref counts. */
+ struct map **parent_maps_by_address;
+ int err = 0;
+ unsigned int n;
+
+ down_write(maps__lock(dest));
down_read(maps__lock(parent));

- maps__for_each_entry(parent, rb_node) {
- struct map *new = map__clone(rb_node->map);
+ parent_maps_by_address = maps__maps_by_address(parent);
+ n = maps__nr_maps(parent);
+ if (maps__empty(dest)) {
+ /* No existing mappings so just copy from parent to avoid reallocs in insert. */
+ unsigned int nr_maps_allocated = RC_CHK_ACCESS(parent)->nr_maps_allocated;
+ struct map **dest_maps_by_address =
+ malloc(nr_maps_allocated * sizeof(struct map *));
+ struct map **dest_maps_by_name = NULL;

- if (new == NULL) {
+ if (!dest_maps_by_address)
err = -ENOMEM;
- goto out_unlock;
+ else {
+ if (maps__maps_by_name(parent)) {
+ dest_maps_by_name =
+ malloc(nr_maps_allocated * sizeof(struct map *));
+ }
+
+ RC_CHK_ACCESS(dest)->maps_by_address = dest_maps_by_address;
+ RC_CHK_ACCESS(dest)->maps_by_name = dest_maps_by_name;
+ RC_CHK_ACCESS(dest)->nr_maps_allocated = nr_maps_allocated;
}

- err = unwind__prepare_access(maps, new, NULL);
- if (err)
- goto out_unlock;
+ for (unsigned int i = 0; !err && i < n; i++) {
+ struct map *pos = parent_maps_by_address[i];
+ struct map *new = map__clone(pos);

- err = maps__insert(maps, new);
- if (err)
- goto out_unlock;
+ if (!new)
+ err = -ENOMEM;
+ else {
+ err = unwind__prepare_access(dest, new, NULL);
+ if (!err) {
+ dest_maps_by_address[i] = new;
+ if (dest_maps_by_name)
+ dest_maps_by_name[i] = map__get(new);
+ RC_CHK_ACCESS(dest)->nr_maps = i + 1;
+ }
+ }
+ if (err)
+ map__put(new);
+ }
+ maps__set_maps_by_address_sorted(dest, maps__maps_by_address_sorted(parent));
+ if (!err) {
+ RC_CHK_ACCESS(dest)->last_search_by_name_idx =
+ RC_CHK_ACCESS(parent)->last_search_by_name_idx;
+ maps__set_maps_by_name_sorted(dest,
+ dest_maps_by_name &&
+ maps__maps_by_name_sorted(parent));
+ } else {
+ RC_CHK_ACCESS(dest)->last_search_by_name_idx = 0;
+ maps__set_maps_by_name_sorted(dest, false);
+ }
+ } else {
+ /* Unexpected copying to a maps containing entries. */
+ for (unsigned int i = 0; !err && i < n; i++) {
+ struct map *pos = parent_maps_by_address[i];
+ struct map *new = map__clone(pos);

- map__put(new);
+ if (!new)
+ err = -ENOMEM;
+ else {
+ err = unwind__prepare_access(dest, new, NULL);
+ if (!err)
+ err = maps__insert(dest, new);
+ }
+ map__put(new);
+ }
}
-
- err = 0;
-out_unlock:
up_read(maps__lock(parent));
+ up_write(maps__lock(dest));
return err;
}

-struct map *maps__find(struct maps *maps, u64 ip)
+static int map__addr_cmp(const void *key, const void *entry)
{
- struct rb_node *p;
- struct map_rb_node *m;
-
+ const u64 ip = *(const u64 *)key;
+ const struct map *map = *(const struct map * const *)entry;

- down_read(maps__lock(maps));
-
- p = maps__entries(maps)->rb_node;
- while (p != NULL) {
- m = rb_entry(p, struct map_rb_node, rb_node);
- if (ip < map__start(m->map))
- p = p->rb_left;
- else if (ip >= map__end(m->map))
- p = p->rb_right;
- else
- goto out;
- }
-
- m = NULL;
-out:
- up_read(maps__lock(maps));
- return m ? m->map : NULL;
+ if (ip < map__start(map))
+ return -1;
+ if (ip >= map__end(map))
+ return 1;
+ return 0;
}

-static int map__strcmp(const void *a, const void *b)
+struct map *maps__find(struct maps *maps, u64 ip)
{
- const struct map *map_a = *(const struct map **)a;
- const struct map *map_b = *(const struct map **)b;
- const struct dso *dso_a = map__dso(map_a);
- const struct dso *dso_b = map__dso(map_b);
- int ret = strcmp(dso_a->short_name, dso_b->short_name);
-
- if (ret == 0 && map_a != map_b) {
- /*
- * Ensure distinct but name equal maps have an order in part to
- * aid reference counting.
- */
- ret = (int)map__start(map_a) - (int)map__start(map_b);
- if (ret == 0)
- ret = (int)((intptr_t)map_a - (intptr_t)map_b);
+ struct map *result = NULL;
+ bool done = false;
+
+ /* See locking/sorting note. */
+ while (!done) {
+ down_read(maps__lock(maps));
+ if (maps__maps_by_address_sorted(maps)) {
+ struct map **mapp =
+ bsearch(&ip, maps__maps_by_address(maps), maps__nr_maps(maps),
+ sizeof(*mapp), map__addr_cmp);
+
+ if (mapp)
+ result = *mapp; // map__get(*mapp);
+ done = true;
+ }
+ up_read(maps__lock(maps));
+ if (!done)
+ maps__sort_by_address(maps);
}
-
- return ret;
+ return result;
}

static int map__strcmp_name(const void *name, const void *b)
@@ -592,126 +865,113 @@ static int map__strcmp_name(const void *name, const void *b)
return strcmp(name, dso->short_name);
}

-void __maps__sort_by_name(struct maps *maps)
-{
- qsort(maps__maps_by_name(maps), maps__nr_maps(maps), sizeof(struct map *), map__strcmp);
-}
-
-static int map__groups__sort_by_name_from_rbtree(struct maps *maps)
-{
- struct map_rb_node *rb_node;
- struct map **maps_by_name = realloc(maps__maps_by_name(maps),
- maps__nr_maps(maps) * sizeof(struct map *));
- int i = 0;
-
- if (maps_by_name == NULL)
- return -1;
-
- up_read(maps__lock(maps));
- down_write(maps__lock(maps));
-
- RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
- RC_CHK_ACCESS(maps)->nr_maps_allocated = maps__nr_maps(maps);
-
- maps__for_each_entry(maps, rb_node)
- maps_by_name[i++] = map__get(rb_node->map);
-
- __maps__sort_by_name(maps);
-
- up_write(maps__lock(maps));
- down_read(maps__lock(maps));
-
- return 0;
-}
-
-static struct map *__maps__find_by_name(struct maps *maps, const char *name)
+struct map *maps__find_by_name(struct maps *maps, const char *name)
{
- struct map **mapp;
+ struct map *result = NULL;
+ bool done = false;

- if (maps__maps_by_name(maps) == NULL &&
- map__groups__sort_by_name_from_rbtree(maps))
- return NULL;
+ /* See locking/sorting note. */
+ while (!done) {
+ unsigned int i;

- mapp = bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
- sizeof(*mapp), map__strcmp_name);
- if (mapp)
- return *mapp;
- return NULL;
-}
+ down_read(maps__lock(maps));

-struct map *maps__find_by_name(struct maps *maps, const char *name)
-{
- struct map_rb_node *rb_node;
- struct map *map;
-
- down_read(maps__lock(maps));
+ /* First check last found entry. */
+ i = RC_CHK_ACCESS(maps)->last_search_by_name_idx;
+ if (i < maps__nr_maps(maps) && maps__maps_by_name(maps)) {
+ struct dso *dso = map__dso(maps__maps_by_name(maps)[i]);

+ if (dso && strcmp(dso->short_name, name) == 0) {
+ result = maps__maps_by_name(maps)[i]; // TODO: map__get
+ done = true;
+ }
+ }

- if (RC_CHK_ACCESS(maps)->last_search_by_name) {
- const struct dso *dso = map__dso(RC_CHK_ACCESS(maps)->last_search_by_name);
+ /* Second search sorted array. */
+ if (!done && maps__maps_by_name_sorted(maps)) {
+ struct map **mapp =
+ bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
+ sizeof(*mapp), map__strcmp_name);

- if (strcmp(dso->short_name, name) == 0) {
- map = RC_CHK_ACCESS(maps)->last_search_by_name;
- goto out_unlock;
+ if (mapp) {
+ result = *mapp; // TODO: map__get
+ i = mapp - maps__maps_by_name(maps);
+ RC_CHK_ACCESS(maps)->last_search_by_name_idx = i;
+ }
+ done = true;
}
- }
- /*
- * If we have maps->maps_by_name, then the name isn't in the rbtree,
- * as maps->maps_by_name mirrors the rbtree when lookups by name are
- * made.
- */
- map = __maps__find_by_name(maps, name);
- if (map || maps__maps_by_name(maps) != NULL)
- goto out_unlock;
-
- /* Fallback to traversing the rbtree... */
- maps__for_each_entry(maps, rb_node) {
- struct dso *dso;
-
- map = rb_node->map;
- dso = map__dso(map);
- if (strcmp(dso->short_name, name) == 0) {
- RC_CHK_ACCESS(maps)->last_search_by_name = map;
- goto out_unlock;
+ up_read(maps__lock(maps));
+ if (!done) {
+ /* Sort and retry binary search. */
+ if (maps__sort_by_name(maps)) {
+ /*
+ * Memory allocation failed do linear search
+ * through address sorted maps.
+ */
+ struct map **maps_by_address;
+ unsigned int n;
+
+ down_read(maps__lock(maps));
+ maps_by_address = maps__maps_by_address(maps);
+ n = maps__nr_maps(maps);
+ for (i = 0; i < n; i++) {
+ struct map *pos = maps_by_address[i];
+ struct dso *dso = map__dso(pos);
+
+ if (dso && strcmp(dso->short_name, name) == 0) {
+ result = pos; // TODO: map__get
+ break;
+ }
+ }
+ up_read(maps__lock(maps));
+ done = true;
+ }
}
}
- map = NULL;
-
-out_unlock:
- up_read(maps__lock(maps));
- return map;
+ return result;
}

struct map *maps__find_next_entry(struct maps *maps, struct map *map)
{
- struct map_rb_node *rb_node = maps__find_node(maps, map);
- struct map_rb_node *next = map_rb_node__next(rb_node);
+ unsigned int i;
+ struct map *result = NULL;

- if (next)
- return next->map;
+ down_read(maps__lock(maps));
+ i = maps__by_address_index(maps, map);
+ if (i < maps__nr_maps(maps))
+ result = maps__maps_by_address(maps)[i]; // TODO: map__get

- return NULL;
+ up_read(maps__lock(maps));
+ return result;
}

void maps__fixup_end(struct maps *maps)
{
- struct map_rb_node *prev = NULL, *curr;
+ struct map **maps_by_address;
+ unsigned int n;

down_write(maps__lock(maps));
+ if (!maps__maps_by_address_sorted(maps))
+ __maps__sort_by_address(maps);

- maps__for_each_entry(maps, curr) {
- if (prev && (!map__end(prev->map) || map__end(prev->map) > map__start(curr->map)))
- map__set_end(prev->map, map__start(curr->map));
+ maps_by_address = maps__maps_by_address(maps);
+ n = maps__nr_maps(maps);
+ for (unsigned int i = 1; i < n; i++) {
+ struct map *prev = maps_by_address[i - 1];
+ struct map *curr = maps_by_address[i];

- prev = curr;
+ if (!map__end(prev) || map__end(prev) > map__start(curr))
+ map__set_end(prev, map__start(curr));
}

/*
* We still haven't the actual symbols, so guess the
* last map final address.
*/
- if (curr && !map__end(curr->map))
- map__set_end(curr->map, ~0ULL);
+ if (n > 0 && !map__end(maps_by_address[n - 1]))
+ map__set_end(maps_by_address[n - 1], ~0ULL);
+
+ RC_CHK_ACCESS(maps)->ends_broken = false;

up_write(maps__lock(maps));
}
@@ -722,117 +982,92 @@ void maps__fixup_end(struct maps *maps)
*/
int maps__merge_in(struct maps *kmaps, struct map *new_map)
{
- struct map_rb_node *rb_node;
- struct rb_node *first;
- bool overlaps;
- LIST_HEAD(merged);
- int err = 0;
+ unsigned int first_after_, kmaps__nr_maps;
+ struct map **kmaps_maps_by_address;
+ struct map **merged_maps_by_address;
+ unsigned int merged_nr_maps_allocated;
+
+ /* First try under a read lock. */
+ while (true) {
+ down_read(maps__lock(kmaps));
+ if (maps__maps_by_address_sorted(kmaps))
+ break;

- down_read(maps__lock(kmaps));
- first = first_ending_after(kmaps, new_map);
- overlaps = first &&
- map__start(rb_entry(first, struct map_rb_node, rb_node)->map) < map__end(new_map);
- up_read(maps__lock(kmaps));
+ up_read(maps__lock(kmaps));
+
+ /* First after binary search requires sorted maps. Sort and try again. */
+ maps__sort_by_address(kmaps);
+ }
+ first_after_ = first_ending_after(kmaps, new_map);
+ kmaps_maps_by_address = maps__maps_by_address(kmaps);

- if (!overlaps)
+ if (first_after_ >= maps__nr_maps(kmaps) ||
+ map__start(kmaps_maps_by_address[first_after_]) >= map__end(new_map)) {
+ /* No overlap so regular insert suffices. */
+ up_read(maps__lock(kmaps));
return maps__insert(kmaps, new_map);
+ }
+ up_read(maps__lock(kmaps));

- maps__for_each_entry(kmaps, rb_node) {
- struct map *old_map = rb_node->map;
+ /* Plain insert with a read-lock failed, try again now with the write lock. */
+ down_write(maps__lock(kmaps));
+ if (!maps__maps_by_address_sorted(kmaps))
+ __maps__sort_by_address(kmaps);

- /* no overload with this one */
- if (map__end(new_map) < map__start(old_map) ||
- map__start(new_map) >= map__end(old_map))
- continue;
+ first_after_ = first_ending_after(kmaps, new_map);
+ kmaps_maps_by_address = maps__maps_by_address(kmaps);
+ kmaps__nr_maps = maps__nr_maps(kmaps);

- if (map__start(new_map) < map__start(old_map)) {
- /*
- * |new......
- * |old....
- */
- if (map__end(new_map) < map__end(old_map)) {
- /*
- * |new......| -> |new..|
- * |old....| -> |old....|
- */
- map__set_end(new_map, map__start(old_map));
- } else {
- /*
- * |new.............| -> |new..| |new..|
- * |old....| -> |old....|
- */
- struct map_list_node *m = map_list_node__new();
+ if (first_after_ >= kmaps__nr_maps ||
+ map__start(kmaps_maps_by_address[first_after_]) >= map__end(new_map)) {
+ /* No overlap so regular insert suffices. */
+ up_write(maps__lock(kmaps));
+ return maps__insert(kmaps, new_map);
+ }
+ /* Array to merge into, possibly 1 more for the sake of new_map. */
+ merged_nr_maps_allocated = RC_CHK_ACCESS(kmaps)->nr_maps_allocated;
+ if (kmaps__nr_maps + 1 == merged_nr_maps_allocated)
+ merged_nr_maps_allocated++;
+
+ merged_maps_by_address = malloc(merged_nr_maps_allocated * sizeof(*merged_maps_by_address));
+ if (!merged_maps_by_address) {
+ up_write(maps__lock(kmaps));
+ return -ENOMEM;
+ }
+ RC_CHK_ACCESS(kmaps)->maps_by_address = merged_maps_by_address;
+ RC_CHK_ACCESS(kmaps)->maps_by_address_sorted = true;
+ zfree(&RC_CHK_ACCESS(kmaps)->maps_by_name);
+ RC_CHK_ACCESS(kmaps)->maps_by_name_sorted = false;
+ RC_CHK_ACCESS(kmaps)->nr_maps_allocated = merged_nr_maps_allocated;

- if (!m) {
- err = -ENOMEM;
- goto out;
- }
+ /* Copy entries before the new_map that can't overlap. */
+ for (unsigned int i = 0; i < first_after_; i++)
+ merged_maps_by_address[i] = map__get(kmaps_maps_by_address[i]);

- m->map = map__clone(new_map);
- if (!m->map) {
- free(m);
- err = -ENOMEM;
- goto out;
- }
+ RC_CHK_ACCESS(kmaps)->nr_maps = first_after_;

- map__set_end(m->map, map__start(old_map));
- list_add_tail(&m->node, &merged);
- map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
- map__set_start(new_map, map__end(old_map));
- }
- } else {
- /*
- * |new......
- * |old....
- */
- if (map__end(new_map) < map__end(old_map)) {
- /*
- * |new..| -> x
- * |old.........| -> |old.........|
- */
- map__put(new_map);
- new_map = NULL;
- break;
- } else {
- /*
- * |new......| -> |new...|
- * |old....| -> |old....|
- */
- map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
- map__set_start(new_map, map__end(old_map));
- }
- }
- }
+ /* Add the new map, it will be split when the later overlapping mappings are added. */
+ __maps__insert(kmaps, new_map);

-out:
- while (!list_empty(&merged)) {
- struct map_list_node *old_node;
+ /* Insert mappings after new_map, splitting new_map in the process. */
+ for (unsigned int i = first_after_; i < kmaps__nr_maps; i++)
+ __maps__fixup_overlap_and_insert(kmaps, kmaps_maps_by_address[i]);

- old_node = list_entry(merged.next, struct map_list_node, node);
- list_del_init(&old_node->node);
- if (!err)
- err = maps__insert(kmaps, old_node->map);
- map__put(old_node->map);
- free(old_node);
- }
+ /* Copy the maps from merged into kmaps. */
+ for (unsigned int i = 0; i < kmaps__nr_maps; i++)
+ map__zput(kmaps_maps_by_address[i]);

- if (new_map) {
- if (!err)
- err = maps__insert(kmaps, new_map);
- map__put(new_map);
- }
- return err;
+ free(kmaps_maps_by_address);
+ up_write(maps__lock(kmaps));
+ return 0;
}

void maps__load_first(struct maps *maps)
{
- struct map_rb_node *first;
-
down_read(maps__lock(maps));

- first = maps__first(maps);
- if (first)
- map__load(first->map);
+ if (maps__nr_maps(maps) > 0)
+ map__load(maps__maps_by_address(maps)[0]);

up_read(maps__lock(maps));
}
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index d836d04c9402..df9dd5a0e3c0 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -25,21 +25,56 @@ static inline struct map_list_node *map_list_node__new(void)
return malloc(sizeof(struct map_list_node));
}

-struct map *maps__find(struct maps *maps, u64 addr);
+/*
+ * Locking/sorting note:
+ *
+ * Sorting is done with the write lock, iteration and binary searching happens
+ * under the read lock requiring being sorted. There is a race between sorting
+ * releasing the write lock and acquiring the read lock for iteration/searching
+ * where another thread could insert and break the sorting of the maps. In
+ * practice inserting maps should be rare meaning that the race shouldn't lead
+ * to live lock. Removal of maps doesn't break being sorted.
+ */

DECLARE_RC_STRUCT(maps) {
- struct rb_root entries;
struct rw_semaphore lock;
- struct machine *machine;
- struct map *last_search_by_name;
+ /**
+ * @maps_by_address: array of maps sorted by their starting address if
+ * maps_by_address_sorted is true.
+ */
+ struct map **maps_by_address;
+ /**
+ * @maps_by_name: optional array of maps sorted by their dso name if
+ * maps_by_name_sorted is true.
+ */
struct map **maps_by_name;
- refcount_t refcnt;
- unsigned int nr_maps;
- unsigned int nr_maps_allocated;
+ struct machine *machine;
#ifdef HAVE_LIBUNWIND_SUPPORT
- void *addr_space;
+ void *addr_space;
const struct unwind_libunwind_ops *unwind_libunwind_ops;
#endif
+ refcount_t refcnt;
+ /**
+ * @nr_maps: number of maps_by_address, and possibly maps_by_name,
+ * entries that contain maps.
+ */
+ unsigned int nr_maps;
+ /**
+ * @nr_maps_allocated: number of entries in maps_by_address and possibly
+ * maps_by_name.
+ */
+ unsigned int nr_maps_allocated;
+ /**
+ * @last_search_by_name_idx: cache of last found by name entry's index
+ * as frequent searches for the same dso name are common.
+ */
+ unsigned int last_search_by_name_idx;
+ /** @maps_by_address_sorted: is maps_by_address sorted. */
+ bool maps_by_address_sorted;
+ /** @maps_by_name_sorted: is maps_by_name sorted. */
+ bool maps_by_name_sorted;
+ /** @ends_broken: does the map contain a map where end values are unset/unsorted? */
+ bool ends_broken;
};

#define KMAP_NAME_LEN 256
@@ -102,6 +137,7 @@ size_t maps__fprintf(struct maps *maps, FILE *fp);
int maps__insert(struct maps *maps, struct map *map);
void maps__remove(struct maps *maps, struct map *map);

+struct map *maps__find(struct maps *maps, u64 addr);
struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp);
struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp);

@@ -117,8 +153,6 @@ struct map *maps__find_next_entry(struct maps *maps, struct map *map);

int maps__merge_in(struct maps *kmaps, struct map *new_map);

-void __maps__sort_by_name(struct maps *maps);
-
void maps__fixup_end(struct maps *maps);

void maps__load_first(struct maps *maps);
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:38:07

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 28/50] perf maps: Add maps__for_each_map to call a function on each entry

Most current uses of maps don't take the rwsem introducing a risk that
the maps will change during iteration. Introduce maps__for_each_map
that iterates the entries under the read lock of the rwsem. This
replaces the maps__for_each_entry macro that is moved into
maps.c. maps__for_each_entry_safe will be replaced in a later change.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/arch/x86/util/event.c | 103 +++++++------
tools/perf/builtin-report.c | 54 ++++---
tools/perf/tests/maps.c | 61 +++++---
tools/perf/tests/vmlinux-kallsyms.c | 181 ++++++++++++-----------
tools/perf/util/machine.c | 53 ++++---
tools/perf/util/maps.c | 102 ++++++++-----
tools/perf/util/maps.h | 6 +-
tools/perf/util/probe-event.c | 40 +++--
tools/perf/util/symbol.c | 36 ++---
tools/perf/util/synthetic-events.c | 118 ++++++++-------
tools/perf/util/thread.c | 35 +++--
tools/perf/util/unwind-libunwind-local.c | 34 +++--
tools/perf/util/vdso.c | 35 +++--
13 files changed, 508 insertions(+), 350 deletions(-)

diff --git a/tools/perf/arch/x86/util/event.c b/tools/perf/arch/x86/util/event.c
index 5741ffe47312..e65b7dbe27fb 100644
--- a/tools/perf/arch/x86/util/event.c
+++ b/tools/perf/arch/x86/util/event.c
@@ -14,66 +14,79 @@

#if defined(__x86_64__)

-int perf_event__synthesize_extra_kmaps(struct perf_tool *tool,
- perf_event__handler_t process,
- struct machine *machine)
+struct perf_event__synthesize_extra_kmaps_cb_args {
+ struct perf_tool *tool;
+ perf_event__handler_t process;
+ struct machine *machine;
+ union perf_event *event;
+};
+
+static int perf_event__synthesize_extra_kmaps_cb(struct map *map, void *data)
{
- int rc = 0;
- struct map_rb_node *pos;
- struct maps *kmaps = machine__kernel_maps(machine);
- union perf_event *event = zalloc(sizeof(event->mmap) +
- machine->id_hdr_size);
+ struct perf_event__synthesize_extra_kmaps_cb_args *args = data;
+ union perf_event *event = args->event;
+ struct kmap *kmap;
+ size_t size;

- if (!event) {
- pr_debug("Not enough memory synthesizing mmap event "
- "for extra kernel maps\n");
- return -1;
- }
+ if (!__map__is_extra_kernel_map(map))
+ return 0;

- maps__for_each_entry(kmaps, pos) {
- struct kmap *kmap;
- size_t size;
- struct map *map = pos->map;
+ kmap = map__kmap(map);

- if (!__map__is_extra_kernel_map(map))
- continue;
+ size = sizeof(event->mmap) - sizeof(event->mmap.filename) +
+ PERF_ALIGN(strlen(kmap->name) + 1, sizeof(u64)) +
+ args->machine->id_hdr_size;

- kmap = map__kmap(map);
+ memset(event, 0, size);

- size = sizeof(event->mmap) - sizeof(event->mmap.filename) +
- PERF_ALIGN(strlen(kmap->name) + 1, sizeof(u64)) +
- machine->id_hdr_size;
+ event->mmap.header.type = PERF_RECORD_MMAP;

- memset(event, 0, size);
+ /*
+ * kernel uses 0 for user space maps, see kernel/perf_event.c
+ * __perf_event_mmap
+ */
+ if (machine__is_host(args->machine))
+ event->header.misc = PERF_RECORD_MISC_KERNEL;
+ else
+ event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;

- event->mmap.header.type = PERF_RECORD_MMAP;
+ event->mmap.header.size = size;

- /*
- * kernel uses 0 for user space maps, see kernel/perf_event.c
- * __perf_event_mmap
- */
- if (machine__is_host(machine))
- event->header.misc = PERF_RECORD_MISC_KERNEL;
- else
- event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
+ event->mmap.start = map__start(map);
+ event->mmap.len = map__size(map);
+ event->mmap.pgoff = map__pgoff(map);
+ event->mmap.pid = args->machine->pid;

- event->mmap.header.size = size;
+ strlcpy(event->mmap.filename, kmap->name, PATH_MAX);

- event->mmap.start = map__start(map);
- event->mmap.len = map__size(map);
- event->mmap.pgoff = map__pgoff(map);
- event->mmap.pid = machine->pid;
+ if (perf_tool__process_synth_event(args->tool, event, args->machine, args->process) != 0)
+ return -1;

- strlcpy(event->mmap.filename, kmap->name, PATH_MAX);
+ return 0;
+}

- if (perf_tool__process_synth_event(tool, event, machine,
- process) != 0) {
- rc = -1;
- break;
- }
+int perf_event__synthesize_extra_kmaps(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine)
+{
+ int rc;
+ struct maps *kmaps = machine__kernel_maps(machine);
+ struct perf_event__synthesize_extra_kmaps_cb_args args = {
+ .tool = tool,
+ .process = process,
+ .machine = machine,
+ .event = zalloc(sizeof(args.event->mmap) + machine->id_hdr_size),
+ };
+
+ if (!args.event) {
+ pr_debug("Not enough memory synthesizing mmap event "
+ "for extra kernel maps\n");
+ return -1;
}

- free(event);
+ rc = maps__for_each_map(kmaps, perf_event__synthesize_extra_kmaps_cb, &args);
+
+ free(args.event);
return rc;
}

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 749246817aed..d7f1f3c8cb7d 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -844,27 +844,47 @@ static struct task *tasks_list(struct task *task, struct machine *machine)
return tasks_list(parent_task, machine);
}

-static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp)
+struct maps__fprintf_task_args {
+ int indent;
+ FILE *fp;
+ size_t printed;
+};
+
+static int maps__fprintf_task_cb(struct map *map, void *data)
{
- size_t printed = 0;
- struct map_rb_node *rb_node;
+ struct maps__fprintf_task_args *args = data;
+ const struct dso *dso = map__dso(map);
+ u32 prot = map__prot(map);
+ int ret;

- maps__for_each_entry(maps, rb_node) {
- struct map *map = rb_node->map;
- const struct dso *dso = map__dso(map);
- u32 prot = map__prot(map);
+ ret = fprintf(args->fp,
+ "%*s %" PRIx64 "-%" PRIx64 " %c%c%c%c %08" PRIx64 " %" PRIu64 " %s\n",
+ args->indent, "", map__start(map), map__end(map),
+ prot & PROT_READ ? 'r' : '-',
+ prot & PROT_WRITE ? 'w' : '-',
+ prot & PROT_EXEC ? 'x' : '-',
+ map__flags(map) ? 's' : 'p',
+ map__pgoff(map),
+ dso->id.ino, dso->name);

- printed += fprintf(fp, "%*s %" PRIx64 "-%" PRIx64 " %c%c%c%c %08" PRIx64 " %" PRIu64 " %s\n",
- indent, "", map__start(map), map__end(map),
- prot & PROT_READ ? 'r' : '-',
- prot & PROT_WRITE ? 'w' : '-',
- prot & PROT_EXEC ? 'x' : '-',
- map__flags(map) ? 's' : 'p',
- map__pgoff(map),
- dso->id.ino, dso->name);
- }
+ if (ret < 0)
+ return ret;
+
+ args->printed += ret;
+ return 0;
+}
+
+static size_t maps__fprintf_task(struct maps *maps, int indent, FILE *fp)
+{
+ struct maps__fprintf_task_args args = {
+ .indent = indent,
+ .fp = fp,
+ .printed = 0,
+ };
+
+ maps__for_each_map(maps, maps__fprintf_task_cb, &args);

- return printed;
+ return args.printed;
}

static void task__print_level(struct task *task, FILE *fp, int level)
diff --git a/tools/perf/tests/maps.c b/tools/perf/tests/maps.c
index 5bb1123a91a7..bb3fbfe5a73e 100644
--- a/tools/perf/tests/maps.c
+++ b/tools/perf/tests/maps.c
@@ -14,44 +14,59 @@ struct map_def {
u64 end;
};

+struct check_maps_cb_args {
+ struct map_def *merged;
+ unsigned int i;
+};
+
+static int check_maps_cb(struct map *map, void *data)
+{
+ struct check_maps_cb_args *args = data;
+ struct map_def *merged = &args->merged[args->i];
+
+ if (map__start(map) != merged->start ||
+ map__end(map) != merged->end ||
+ strcmp(map__dso(map)->name, merged->name) ||
+ refcount_read(map__refcnt(map)) != 1) {
+ return 1;
+ }
+ args->i++;
+ return 0;
+}
+
+static int failed_cb(struct map *map, void *data __maybe_unused)
+{
+ pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: %d\n",
+ map__start(map),
+ map__end(map),
+ map__dso(map)->name,
+ refcount_read(map__refcnt(map)));
+
+ return 0;
+}
+
static int check_maps(struct map_def *merged, unsigned int size, struct maps *maps)
{
- struct map_rb_node *rb_node;
- unsigned int i = 0;
bool failed = false;

if (maps__nr_maps(maps) != size) {
pr_debug("Expected %d maps, got %d", size, maps__nr_maps(maps));
failed = true;
} else {
- maps__for_each_entry(maps, rb_node) {
- struct map *map = rb_node->map;
-
- if (map__start(map) != merged[i].start ||
- map__end(map) != merged[i].end ||
- strcmp(map__dso(map)->name, merged[i].name) ||
- refcount_read(map__refcnt(map)) != 1) {
- failed = true;
- }
- i++;
- }
+ struct check_maps_cb_args args = {
+ .merged = merged,
+ .i = 0,
+ };
+ failed = maps__for_each_map(maps, check_maps_cb, &args);
}
if (failed) {
pr_debug("Expected:\n");
- for (i = 0; i < size; i++) {
+ for (unsigned int i = 0; i < size; i++) {
pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: 1\n",
merged[i].start, merged[i].end, merged[i].name);
}
pr_debug("Got:\n");
- maps__for_each_entry(maps, rb_node) {
- struct map *map = rb_node->map;
-
- pr_debug("\tstart: %" PRIu64 " end: %" PRIu64 " name: '%s' refcnt: %d\n",
- map__start(map),
- map__end(map),
- map__dso(map)->name,
- refcount_read(map__refcnt(map)));
- }
+ maps__for_each_map(maps, failed_cb, NULL);
}
return failed ? TEST_FAIL : TEST_OK;
}
diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 1078a93b01aa..822f893e67d5 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -112,18 +112,92 @@ static bool is_ignored_symbol(const char *name, char type)
return false;
}

+struct test__vmlinux_matches_kallsyms_cb_args {
+ struct machine kallsyms;
+ struct map *vmlinux_map;
+ bool header_printed;
+};
+
+static int test__vmlinux_matches_kallsyms_cb1(struct map *map, void *data)
+{
+ struct test__vmlinux_matches_kallsyms_cb_args *args = data;
+ struct dso *dso = map__dso(map);
+ /*
+ * If it is the kernel, kallsyms is always "[kernel.kallsyms]", while
+ * the kernel will have the path for the vmlinux file being used, so use
+ * the short name, less descriptive but the same ("[kernel]" in both
+ * cases.
+ */
+ struct map *pair = maps__find_by_name(args->kallsyms.kmaps,
+ (dso->kernel ? dso->short_name : dso->name));
+
+ if (pair)
+ map__set_priv(pair, 1);
+ else {
+ if (!args->header_printed) {
+ pr_info("WARN: Maps only in vmlinux:\n");
+ args->header_printed = true;
+ }
+ map__fprintf(map, stderr);
+ }
+ return 0;
+}
+
+static int test__vmlinux_matches_kallsyms_cb2(struct map *map, void *data)
+{
+ struct test__vmlinux_matches_kallsyms_cb_args *args = data;
+ struct map *pair;
+ u64 mem_start = map__unmap_ip(args->vmlinux_map, map__start(map));
+ u64 mem_end = map__unmap_ip(args->vmlinux_map, map__end(map));
+
+ pair = maps__find(args->kallsyms.kmaps, mem_start);
+ if (pair == NULL || map__priv(pair))
+ return 0;
+
+ if (map__start(pair) == mem_start) {
+ struct dso *dso = map__dso(map);
+
+ if (!args->header_printed) {
+ pr_info("WARN: Maps in vmlinux with a different name in kallsyms:\n");
+ args->header_printed = true;
+ }
+
+ pr_info("WARN: %" PRIx64 "-%" PRIx64 " %" PRIx64 " %s in kallsyms as",
+ map__start(map), map__end(map), map__pgoff(map), dso->name);
+ if (mem_end != map__end(pair))
+ pr_info(":\nWARN: *%" PRIx64 "-%" PRIx64 " %" PRIx64,
+ map__start(pair), map__end(pair), map__pgoff(pair));
+ pr_info(" %s\n", dso->name);
+ map__set_priv(pair, 1);
+ }
+ return 0;
+}
+
+static int test__vmlinux_matches_kallsyms_cb3(struct map *map, void *data)
+{
+ struct test__vmlinux_matches_kallsyms_cb_args *args = data;
+
+ if (!map__priv(map)) {
+ if (!args->header_printed) {
+ pr_info("WARN: Maps only in kallsyms:\n");
+ args->header_printed = true;
+ }
+ map__fprintf(map, stderr);
+ }
+ return 0;
+}
+
static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused,
int subtest __maybe_unused)
{
int err = TEST_FAIL;
struct rb_node *nd;
struct symbol *sym;
- struct map *kallsyms_map, *vmlinux_map;
- struct map_rb_node *rb_node;
- struct machine kallsyms, vmlinux;
+ struct map *kallsyms_map;
+ struct machine vmlinux;
struct maps *maps;
u64 mem_start, mem_end;
- bool header_printed;
+ struct test__vmlinux_matches_kallsyms_cb_args args;

/*
* Step 1:
@@ -131,7 +205,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
* Init the machines that will hold kernel, modules obtained from
* both vmlinux + .ko files and from /proc/kallsyms split by modules.
*/
- machine__init(&kallsyms, "", HOST_KERNEL_ID);
+ machine__init(&args.kallsyms, "", HOST_KERNEL_ID);
machine__init(&vmlinux, "", HOST_KERNEL_ID);

maps = machine__kernel_maps(&vmlinux);
@@ -143,7 +217,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
* load /proc/kallsyms. Also create the modules maps from /proc/modules
* and find the .ko files that match them in /lib/modules/`uname -r`/.
*/
- if (machine__create_kernel_maps(&kallsyms) < 0) {
+ if (machine__create_kernel_maps(&args.kallsyms) < 0) {
pr_debug("machine__create_kernel_maps failed");
err = TEST_SKIP;
goto out;
@@ -160,7 +234,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
* be compacted against the list of modules found in the "vmlinux"
* code and with the one got from /proc/modules from the "kallsyms" code.
*/
- if (machine__load_kallsyms(&kallsyms, "/proc/kallsyms") <= 0) {
+ if (machine__load_kallsyms(&args.kallsyms, "/proc/kallsyms") <= 0) {
pr_debug("machine__load_kallsyms failed");
err = TEST_SKIP;
goto out;
@@ -174,7 +248,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
* to see if the running kernel was relocated by checking if it has the
* same value in the vmlinux file we load.
*/
- kallsyms_map = machine__kernel_map(&kallsyms);
+ kallsyms_map = machine__kernel_map(&args.kallsyms);

/*
* Step 5:
@@ -186,7 +260,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
goto out;
}

- vmlinux_map = machine__kernel_map(&vmlinux);
+ args.vmlinux_map = machine__kernel_map(&vmlinux);

/*
* Step 6:
@@ -213,7 +287,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
* in the kallsyms dso. For the ones that are in both, check its names and
* end addresses too.
*/
- map__for_each_symbol(vmlinux_map, sym, nd) {
+ map__for_each_symbol(args.vmlinux_map, sym, nd) {
struct symbol *pair, *first_pair;

sym = rb_entry(nd, struct symbol, rb_node);
@@ -221,10 +295,10 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
if (sym->start == sym->end)
continue;

- mem_start = map__unmap_ip(vmlinux_map, sym->start);
- mem_end = map__unmap_ip(vmlinux_map, sym->end);
+ mem_start = map__unmap_ip(args.vmlinux_map, sym->start);
+ mem_end = map__unmap_ip(args.vmlinux_map, sym->end);

- first_pair = machine__find_kernel_symbol(&kallsyms, mem_start, NULL);
+ first_pair = machine__find_kernel_symbol(&args.kallsyms, mem_start, NULL);
pair = first_pair;

if (pair && UM(pair->start) == mem_start) {
@@ -253,7 +327,8 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
*/
continue;
} else {
- pair = machine__find_kernel_symbol_by_name(&kallsyms, sym->name, NULL);
+ pair = machine__find_kernel_symbol_by_name(&args.kallsyms,
+ sym->name, NULL);
if (pair) {
if (UM(pair->start) == mem_start)
goto next_pair;
@@ -267,7 +342,7 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused

continue;
}
- } else if (mem_start == map__end(kallsyms.vmlinux_map)) {
+ } else if (mem_start == map__end(args.kallsyms.vmlinux_map)) {
/*
* Ignore aliases to _etext, i.e. to the end of the kernel text area,
* such as __indirect_thunk_end.
@@ -289,78 +364,18 @@ static int test__vmlinux_matches_kallsyms(struct test_suite *test __maybe_unused
if (verbose <= 0)
goto out;

- header_printed = false;
-
- maps__for_each_entry(maps, rb_node) {
- struct map *map = rb_node->map;
- struct dso *dso = map__dso(map);
- /*
- * If it is the kernel, kallsyms is always "[kernel.kallsyms]", while
- * the kernel will have the path for the vmlinux file being used,
- * so use the short name, less descriptive but the same ("[kernel]" in
- * both cases.
- */
- struct map *pair = maps__find_by_name(kallsyms.kmaps, (dso->kernel ?
- dso->short_name :
- dso->name));
- if (pair) {
- map__set_priv(pair, 1);
- } else {
- if (!header_printed) {
- pr_info("WARN: Maps only in vmlinux:\n");
- header_printed = true;
- }
- map__fprintf(map, stderr);
- }
- }
-
- header_printed = false;
-
- maps__for_each_entry(maps, rb_node) {
- struct map *pair, *map = rb_node->map;
-
- mem_start = map__unmap_ip(vmlinux_map, map__start(map));
- mem_end = map__unmap_ip(vmlinux_map, map__end(map));
+ args.header_printed = false;
+ maps__for_each_map(maps, test__vmlinux_matches_kallsyms_cb1, &args);

- pair = maps__find(kallsyms.kmaps, mem_start);
- if (pair == NULL || map__priv(pair))
- continue;
-
- if (map__start(pair) == mem_start) {
- struct dso *dso = map__dso(map);
-
- if (!header_printed) {
- pr_info("WARN: Maps in vmlinux with a different name in kallsyms:\n");
- header_printed = true;
- }
-
- pr_info("WARN: %" PRIx64 "-%" PRIx64 " %" PRIx64 " %s in kallsyms as",
- map__start(map), map__end(map), map__pgoff(map), dso->name);
- if (mem_end != map__end(pair))
- pr_info(":\nWARN: *%" PRIx64 "-%" PRIx64 " %" PRIx64,
- map__start(pair), map__end(pair), map__pgoff(pair));
- pr_info(" %s\n", dso->name);
- map__set_priv(pair, 1);
- }
- }
-
- header_printed = false;
-
- maps = machine__kernel_maps(&kallsyms);
+ args.header_printed = false;
+ maps__for_each_map(maps, test__vmlinux_matches_kallsyms_cb2, &args);

- maps__for_each_entry(maps, rb_node) {
- struct map *map = rb_node->map;
+ args.header_printed = false;
+ maps = machine__kernel_maps(&args.kallsyms);
+ maps__for_each_map(maps, test__vmlinux_matches_kallsyms_cb3, &args);

- if (!map__priv(map)) {
- if (!header_printed) {
- pr_info("WARN: Maps only in kallsyms:\n");
- header_printed = true;
- }
- map__fprintf(map, stderr);
- }
- }
out:
- machine__exit(&kallsyms);
+ machine__exit(&args.kallsyms);
machine__exit(&vmlinux);
return err;
}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b6831a1f909d..3c967295c9a3 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1286,33 +1286,46 @@ static u64 find_entry_trampoline(struct dso *dso)
#define X86_64_CPU_ENTRY_AREA_SIZE 0x2c000
#define X86_64_ENTRY_TRAMPOLINE 0x6000

+struct machine__map_x86_64_entry_trampolines_args {
+ struct maps *kmaps;
+ bool found;
+};
+
+static int machine__map_x86_64_entry_trampolines_cb(struct map *map, void *data)
+{
+ struct machine__map_x86_64_entry_trampolines_args *args = data;
+ struct map *dest_map;
+ struct kmap *kmap = __map__kmap(map);
+
+ if (!kmap || !is_entry_trampoline(kmap->name))
+ return 0;
+
+ dest_map = maps__find(args->kmaps, map__pgoff(map));
+ if (dest_map != map)
+ map__set_pgoff(map, map__map_ip(dest_map, map__pgoff(map)));
+
+ args->found = true;
+ return 0;
+}
+
/* Map x86_64 PTI entry trampolines */
int machine__map_x86_64_entry_trampolines(struct machine *machine,
struct dso *kernel)
{
- struct maps *kmaps = machine__kernel_maps(machine);
+ struct machine__map_x86_64_entry_trampolines_args args = {
+ .kmaps = machine__kernel_maps(machine),
+ .found = false,
+ };
int nr_cpus_avail, cpu;
- bool found = false;
- struct map_rb_node *rb_node;
u64 pgoff;

/*
* In the vmlinux case, pgoff is a virtual address which must now be
* mapped to a vmlinux offset.
*/
- maps__for_each_entry(kmaps, rb_node) {
- struct map *dest_map, *map = rb_node->map;
- struct kmap *kmap = __map__kmap(map);
-
- if (!kmap || !is_entry_trampoline(kmap->name))
- continue;
+ maps__for_each_map(args.kmaps, machine__map_x86_64_entry_trampolines_cb, &args);

- dest_map = maps__find(kmaps, map__pgoff(map));
- if (dest_map != map)
- map__set_pgoff(map, map__map_ip(dest_map, map__pgoff(map)));
- found = true;
- }
- if (found || machine->trampolines_mapped)
+ if (args.found || machine->trampolines_mapped)
return 0;

pgoff = find_entry_trampoline(kernel);
@@ -3395,16 +3408,8 @@ int machine__for_each_dso(struct machine *machine, machine__dso_t fn, void *priv
int machine__for_each_kernel_map(struct machine *machine, machine__map_t fn, void *priv)
{
struct maps *maps = machine__kernel_maps(machine);
- struct map_rb_node *pos;
- int err = 0;

- maps__for_each_entry(maps, pos) {
- err = fn(pos->map, priv);
- if (err != 0) {
- break;
- }
- }
- return err;
+ return maps__for_each_map(maps, fn, priv);
}

bool machine__is_lock_function(struct machine *machine, u64 addr)
diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 9a011aed4b75..00e6589bba10 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -10,6 +10,9 @@
#include "ui/ui.h"
#include "unwind.h"

+#define maps__for_each_entry(maps, map) \
+ for (map = maps__first(maps); map; map = map_rb_node__next(map))
+
static void maps__init(struct maps *maps, struct machine *machine)
{
refcount_set(maps__refcnt(maps), 1);
@@ -196,6 +199,21 @@ void maps__put(struct maps *maps)
RC_CHK_PUT(maps);
}

+int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data)
+{
+ struct map_rb_node *pos;
+ int ret = 0;
+
+ down_read(maps__lock(maps));
+ maps__for_each_entry(maps, pos) {
+ ret = cb(pos->map, data);
+ if (ret)
+ break;
+ }
+ up_read(maps__lock(maps));
+ return ret;
+}
+
struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
{
struct map *map = maps__find(maps, addr);
@@ -210,31 +228,40 @@ struct symbol *maps__find_symbol(struct maps *maps, u64 addr, struct map **mapp)
return NULL;
}

-struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp)
-{
+struct maps__find_symbol_by_name_args {
+ struct map **mapp;
+ const char *name;
struct symbol *sym;
- struct map_rb_node *pos;
+};

- down_read(maps__lock(maps));
+static int maps__find_symbol_by_name_cb(struct map *map, void *data)
+{
+ struct maps__find_symbol_by_name_args *args = data;

- maps__for_each_entry(maps, pos) {
- sym = map__find_symbol_by_name(pos->map, name);
+ args->sym = map__find_symbol_by_name(map, args->name);
+ if (!args->sym)
+ return 0;

- if (sym == NULL)
- continue;
- if (!map__contains_symbol(pos->map, sym)) {
- sym = NULL;
- continue;
- }
- if (mapp != NULL)
- *mapp = pos->map;
- goto out;
+ if (!map__contains_symbol(map, args->sym)) {
+ args->sym = NULL;
+ return 0;
}

- sym = NULL;
-out:
- up_read(maps__lock(maps));
- return sym;
+ if (args->mapp != NULL)
+ *args->mapp = map__get(map);
+ return 1;
+}
+
+struct symbol *maps__find_symbol_by_name(struct maps *maps, const char *name, struct map **mapp)
+{
+ struct maps__find_symbol_by_name_args args = {
+ .mapp = mapp,
+ .name = name,
+ .sym = NULL,
+ };
+
+ maps__for_each_map(maps, maps__find_symbol_by_name_cb, &args);
+ return args.sym;
}

int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
@@ -253,25 +280,34 @@ int maps__find_ams(struct maps *maps, struct addr_map_symbol *ams)
return ams->ms.sym ? 0 : -1;
}

-size_t maps__fprintf(struct maps *maps, FILE *fp)
-{
- size_t printed = 0;
- struct map_rb_node *pos;
+struct maps__fprintf_args {
+ FILE *fp;
+ size_t printed;
+};

- down_read(maps__lock(maps));
+static int maps__fprintf_cb(struct map *map, void *data)
+{
+ struct maps__fprintf_args *args = data;

- maps__for_each_entry(maps, pos) {
- printed += fprintf(fp, "Map:");
- printed += map__fprintf(pos->map, fp);
- if (verbose > 2) {
- printed += dso__fprintf(map__dso(pos->map), fp);
- printed += fprintf(fp, "--\n");
- }
+ args->printed += fprintf(args->fp, "Map:");
+ args->printed += map__fprintf(map, args->fp);
+ if (verbose > 2) {
+ args->printed += dso__fprintf(map__dso(map), args->fp);
+ args->printed += fprintf(args->fp, "--\n");
}
+ return 0;
+}

- up_read(maps__lock(maps));
+size_t maps__fprintf(struct maps *maps, FILE *fp)
+{
+ struct maps__fprintf_args args = {
+ .fp = fp,
+ .printed = 0,
+ };
+
+ maps__for_each_map(maps, maps__fprintf_cb, &args);

- return printed;
+ return args.printed;
}

int maps__fixup_overlappings(struct maps *maps, struct map *map, FILE *fp)
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index a689149be8c4..8ac30cdaf5bd 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -36,9 +36,6 @@ struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
struct map *maps__find(struct maps *maps, u64 addr);

-#define maps__for_each_entry(maps, map) \
- for (map = maps__first(maps); map; map = map_rb_node__next(map))
-
#define maps__for_each_entry_safe(maps, map, next) \
for (map = maps__first(maps), next = map_rb_node__next(map); map; \
map = next, next = map_rb_node__next(map))
@@ -81,6 +78,9 @@ static inline void __maps__zput(struct maps **map)

#define maps__zput(map) __maps__zput(&map)

+/* Iterate over map calling cb for each entry. */
+int maps__for_each_map(struct maps *maps, int (*cb)(struct map *map, void *data), void *data);
+
static inline struct rb_root *maps__entries(struct maps *maps)
{
return &RC_CHK_ACCESS(maps)->entries;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 1a5b7fa459b2..a1a796043691 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -149,10 +149,32 @@ static int kernel_get_symbol_address_by_name(const char *name, u64 *addr,
return 0;
}

+struct kernel_get_module_map_cb_args {
+ const char *module;
+ struct map *result;
+};
+
+static int kernel_get_module_map_cb(struct map *map, void *data)
+{
+ struct kernel_get_module_map_cb_args *args = data;
+ struct dso *dso = map__dso(map);
+ const char *short_name = dso->short_name; /* short_name is "[module]" */
+ u16 short_name_len = dso->short_name_len;
+
+ if (strncmp(short_name + 1, args->module, short_name_len - 2) == 0 &&
+ args->module[short_name_len - 2] == '\0') {
+ args->result = map__get(map);
+ return 1;
+ }
+ return 0;
+}
+
static struct map *kernel_get_module_map(const char *module)
{
- struct maps *maps = machine__kernel_maps(host_machine);
- struct map_rb_node *pos;
+ struct kernel_get_module_map_cb_args args = {
+ .module = module,
+ .result = NULL,
+ };

/* A file path -- this is an offline module */
if (module && strchr(module, '/'))
@@ -164,19 +186,9 @@ static struct map *kernel_get_module_map(const char *module)
return map__get(map);
}

- maps__for_each_entry(maps, pos) {
- /* short_name is "[module]" */
- struct dso *dso = map__dso(pos->map);
- const char *short_name = dso->short_name;
- u16 short_name_len = dso->short_name_len;
+ maps__for_each_map(machine__kernel_maps(host_machine), kernel_get_module_map_cb, &args);

- if (strncmp(short_name + 1, module,
- short_name_len - 2) == 0 &&
- module[short_name_len - 2] == '\0') {
- return map__get(pos->map);
- }
- }
- return NULL;
+ return args.result;
}

struct map *get_target_map(const char *target, struct nsinfo *nsi, bool user)
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 1cc42b8d8afb..72f03b875478 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1114,33 +1114,35 @@ int compare_proc_modules(const char *from, const char *to)
return ret;
}

+static int do_validate_kcore_modules_cb(struct map *old_map, void *data)
+{
+ struct rb_root *modules = data;
+ struct module_info *mi;
+ struct dso *dso;
+
+ if (!__map__is_kmodule(old_map))
+ return 0;
+
+ dso = map__dso(old_map);
+ /* Module must be in memory at the same address */
+ mi = find_module(dso->short_name, modules);
+ if (!mi || mi->start != map__start(old_map))
+ return -EINVAL;
+
+ return 0;
+}
+
static int do_validate_kcore_modules(const char *filename, struct maps *kmaps)
{
struct rb_root modules = RB_ROOT;
- struct map_rb_node *old_node;
int err;

err = read_proc_modules(filename, &modules);
if (err)
return err;

- maps__for_each_entry(kmaps, old_node) {
- struct map *old_map = old_node->map;
- struct module_info *mi;
- struct dso *dso;
+ err = maps__for_each_map(kmaps, do_validate_kcore_modules_cb, &modules);

- if (!__map__is_kmodule(old_map)) {
- continue;
- }
- dso = map__dso(old_map);
- /* Module must be in memory at the same address */
- mi = find_module(dso->short_name, &modules);
- if (!mi || mi->start != map__start(old_map)) {
- err = -EINVAL;
- goto out;
- }
- }
-out:
delete_modules(&modules);
return err;
}
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 7cc38f2a0e9e..cdab6aa04917 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -666,18 +666,74 @@ int perf_event__synthesize_cgroups(struct perf_tool *tool __maybe_unused,
}
#endif

+struct perf_event__synthesize_modules_maps_cb_args {
+ struct perf_tool *tool;
+ perf_event__handler_t process;
+ struct machine *machine;
+ union perf_event *event;
+};
+
+static int perf_event__synthesize_modules_maps_cb(struct map *map, void *data)
+{
+ struct perf_event__synthesize_modules_maps_cb_args *args = data;
+ union perf_event *event = args->event;
+ struct dso *dso;
+ size_t size;
+
+ if (!__map__is_kmodule(map))
+ return 0;
+
+ dso = map__dso(map);
+ if (symbol_conf.buildid_mmap2) {
+ size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
+ event->mmap2.header.type = PERF_RECORD_MMAP2;
+ event->mmap2.header.size = (sizeof(event->mmap2) -
+ (sizeof(event->mmap2.filename) - size));
+ memset(event->mmap2.filename + size, 0, args->machine->id_hdr_size);
+ event->mmap2.header.size += args->machine->id_hdr_size;
+ event->mmap2.start = map__start(map);
+ event->mmap2.len = map__size(map);
+ event->mmap2.pid = args->machine->pid;
+
+ memcpy(event->mmap2.filename, dso->long_name, dso->long_name_len + 1);
+
+ perf_record_mmap2__read_build_id(&event->mmap2, args->machine, false);
+ } else {
+ size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
+ event->mmap.header.type = PERF_RECORD_MMAP;
+ event->mmap.header.size = (sizeof(event->mmap) -
+ (sizeof(event->mmap.filename) - size));
+ memset(event->mmap.filename + size, 0, args->machine->id_hdr_size);
+ event->mmap.header.size += args->machine->id_hdr_size;
+ event->mmap.start = map__start(map);
+ event->mmap.len = map__size(map);
+ event->mmap.pid = args->machine->pid;
+
+ memcpy(event->mmap.filename, dso->long_name, dso->long_name_len + 1);
+ }
+
+ if (perf_tool__process_synth_event(args->tool, event, args->machine, args->process) != 0)
+ return -1;
+
+ return 0;
+}
+
int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t process,
struct machine *machine)
{
- int rc = 0;
- struct map_rb_node *pos;
+ int rc;
struct maps *maps = machine__kernel_maps(machine);
- union perf_event *event;
- size_t size = symbol_conf.buildid_mmap2 ?
- sizeof(event->mmap2) : sizeof(event->mmap);
+ struct perf_event__synthesize_modules_maps_cb_args args = {
+ .tool = tool,
+ .process = process,
+ .machine = machine,
+ };
+ size_t size = symbol_conf.buildid_mmap2
+ ? sizeof(args.event->mmap2)
+ : sizeof(args.event->mmap);

- event = zalloc(size + machine->id_hdr_size);
- if (event == NULL) {
+ args.event = zalloc(size + machine->id_hdr_size);
+ if (args.event == NULL) {
pr_debug("Not enough memory synthesizing mmap event "
"for kernel modules\n");
return -1;
@@ -688,53 +744,13 @@ int perf_event__synthesize_modules(struct perf_tool *tool, perf_event__handler_t
* __perf_event_mmap
*/
if (machine__is_host(machine))
- event->header.misc = PERF_RECORD_MISC_KERNEL;
+ args.event->header.misc = PERF_RECORD_MISC_KERNEL;
else
- event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
-
- maps__for_each_entry(maps, pos) {
- struct map *map = pos->map;
- struct dso *dso;
+ args.event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;

- if (!__map__is_kmodule(map))
- continue;
+ rc = maps__for_each_map(maps, perf_event__synthesize_modules_maps_cb, &args);

- dso = map__dso(map);
- if (symbol_conf.buildid_mmap2) {
- size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
- event->mmap2.header.type = PERF_RECORD_MMAP2;
- event->mmap2.header.size = (sizeof(event->mmap2) -
- (sizeof(event->mmap2.filename) - size));
- memset(event->mmap2.filename + size, 0, machine->id_hdr_size);
- event->mmap2.header.size += machine->id_hdr_size;
- event->mmap2.start = map__start(map);
- event->mmap2.len = map__size(map);
- event->mmap2.pid = machine->pid;
-
- memcpy(event->mmap2.filename, dso->long_name, dso->long_name_len + 1);
-
- perf_record_mmap2__read_build_id(&event->mmap2, machine, false);
- } else {
- size = PERF_ALIGN(dso->long_name_len + 1, sizeof(u64));
- event->mmap.header.type = PERF_RECORD_MMAP;
- event->mmap.header.size = (sizeof(event->mmap) -
- (sizeof(event->mmap.filename) - size));
- memset(event->mmap.filename + size, 0, machine->id_hdr_size);
- event->mmap.header.size += machine->id_hdr_size;
- event->mmap.start = map__start(map);
- event->mmap.len = map__size(map);
- event->mmap.pid = machine->pid;
-
- memcpy(event->mmap.filename, dso->long_name, dso->long_name_len + 1);
- }
-
- if (perf_tool__process_synth_event(tool, event, machine, process) != 0) {
- rc = -1;
- break;
- }
- }
-
- free(event);
+ free(args.event);
return rc;
}

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index eae3041408a7..2b938ccbfe3e 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -349,34 +349,33 @@ int thread__insert_map(struct thread *thread, struct map *map)
return maps__insert(thread__maps(thread), map);
}

-static int __thread__prepare_access(struct thread *thread)
+struct thread__prepare_access_maps_cb_args {
+ int err;
+ struct maps *maps;
+};
+
+static int thread__prepare_access_maps_cb(struct map *map, void *data)
{
bool initialized = false;
- int err = 0;
- struct maps *maps = thread__maps(thread);
- struct map_rb_node *rb_node;
-
- down_read(maps__lock(maps));
-
- maps__for_each_entry(maps, rb_node) {
- err = unwind__prepare_access(thread__maps(thread), rb_node->map, &initialized);
- if (err || initialized)
- break;
- }
+ struct thread__prepare_access_maps_cb_args *args = data;

- up_read(maps__lock(maps));
+ args->err = unwind__prepare_access(args->maps, map, &initialized);

- return err;
+ return (args->err || initialized) ? 1 : 0;
}

static int thread__prepare_access(struct thread *thread)
{
- int err = 0;
+ struct thread__prepare_access_maps_cb_args args = {
+ .err = 0,
+ };

- if (dwarf_callchain_users)
- err = __thread__prepare_access(thread);
+ if (dwarf_callchain_users) {
+ args.maps = thread__maps(thread);
+ maps__for_each_map(thread__maps(thread), thread__prepare_access_maps_cb, &args);
+ }

- return err;
+ return args.err;
}

static int thread__clone_maps(struct thread *thread, struct thread *parent, bool do_maps_clone)
diff --git a/tools/perf/util/unwind-libunwind-local.c b/tools/perf/util/unwind-libunwind-local.c
index c0641882fd2f..228f1565bd0b 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -302,12 +302,31 @@ static int unwind_spec_ehframe(struct dso *dso, struct machine *machine,
return 0;
}

+struct read_unwind_spec_eh_frame_maps_cb_args {
+ struct dso *dso;
+ u64 base_addr;
+};
+
+static int read_unwind_spec_eh_frame_maps_cb(struct map *map, void *data)
+{
+
+ struct read_unwind_spec_eh_frame_maps_cb_args *args = data;
+
+ if (map__dso(map) == args->dso && map__start(map) < args->base_addr)
+ args->base_addr = map__start(map);
+
+ return 0;
+}
+
+
static int read_unwind_spec_eh_frame(struct dso *dso, struct unwind_info *ui,
u64 *table_data, u64 *segbase,
u64 *fde_count)
{
- struct map_rb_node *map_node;
- u64 base_addr = UINT64_MAX;
+ struct read_unwind_spec_eh_frame_maps_cb_args args = {
+ .dso = dso,
+ .base_addr = UINT64_MAX,
+ };
int ret, fd;

if (dso->data.eh_frame_hdr_offset == 0) {
@@ -325,16 +344,11 @@ static int read_unwind_spec_eh_frame(struct dso *dso, struct unwind_info *ui,
return -EINVAL;
}

- maps__for_each_entry(thread__maps(ui->thread), map_node) {
- struct map *map = map_node->map;
- u64 start = map__start(map);
+ maps__for_each_map(thread__maps(ui->thread), read_unwind_spec_eh_frame_maps_cb, &args);

- if (map__dso(map) == dso && start < base_addr)
- base_addr = start;
- }
- base_addr -= dso->data.elf_base_addr;
+ args.base_addr -= dso->data.elf_base_addr;
/* Address of .eh_frame_hdr */
- *segbase = base_addr + dso->data.eh_frame_hdr_addr;
+ *segbase = args.base_addr + dso->data.eh_frame_hdr_addr;
ret = unwind_spec_ehframe(dso, ui->machine, dso->data.eh_frame_hdr_offset,
table_data, fde_count);
if (ret)
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index ae3eee69b659..df8963796187 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -140,23 +140,34 @@ static struct dso *__machine__addnew_vdso(struct machine *machine, const char *s
return dso;
}

+struct machine__thread_dso_type_maps_cb_args {
+ struct machine *machine;
+ enum dso_type dso_type;
+};
+
+static int machine__thread_dso_type_maps_cb(struct map *map, void *data)
+{
+ struct machine__thread_dso_type_maps_cb_args *args = data;
+ struct dso *dso = map__dso(map);
+
+ if (!dso || dso->long_name[0] != '/')
+ return 0;
+
+ args->dso_type = dso__type(dso, args->machine);
+ return (args->dso_type != DSO__TYPE_UNKNOWN) ? 1 : 0;
+}
+
static enum dso_type machine__thread_dso_type(struct machine *machine,
struct thread *thread)
{
- enum dso_type dso_type = DSO__TYPE_UNKNOWN;
- struct map_rb_node *rb_node;
-
- maps__for_each_entry(thread__maps(thread), rb_node) {
- struct dso *dso = map__dso(rb_node->map);
+ struct machine__thread_dso_type_maps_cb_args args = {
+ .machine = machine,
+ .dso_type = DSO__TYPE_UNKNOWN,
+ };

- if (!dso || dso->long_name[0] != '/')
- continue;
- dso_type = dso__type(dso, machine);
- if (dso_type != DSO__TYPE_UNKNOWN)
- break;
- }
+ maps__for_each_map(thread__maps(thread), machine__thread_dso_type_maps_cb, &args);

- return dso_type;
+ return args.dso_type;
}

#if BITS_PER_LONG == 64
--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:38:36

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 49/50] perf threads: Switch from rbtree to hashmap

The rbtree provides a sorting on entries but this is unused. Switch to
using hashmap for O(1) rather than O(log n) find/insert/remove
complexity.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/threads.c | 146 ++++++++++++--------------------------
tools/perf/util/threads.h | 6 +-
2 files changed, 47 insertions(+), 105 deletions(-)

diff --git a/tools/perf/util/threads.c b/tools/perf/util/threads.c
index d984ec939c7b..55923be53180 100644
--- a/tools/perf/util/threads.c
+++ b/tools/perf/util/threads.c
@@ -3,25 +3,30 @@
#include "machine.h"
#include "thread.h"

-struct thread_rb_node {
- struct rb_node rb_node;
- struct thread *thread;
-};
-
static struct threads_table_entry *threads__table(struct threads *threads, pid_t tid)
{
/* Cast it to handle tid == -1 */
return &threads->table[(unsigned int)tid % THREADS__TABLE_SIZE];
}

+static size_t key_hash(long key, void *ctx __maybe_unused)
+{
+ /* The table lookup removes low bit entropy, but this is just ignored here. */
+ return key;
+}
+
+static bool key_equal(long key1, long key2, void *ctx __maybe_unused)
+{
+ return key1 == key2;
+}
+
void threads__init(struct threads *threads)
{
for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
struct threads_table_entry *table = &threads->table[i];

- table->entries = RB_ROOT_CACHED;
+ hashmap__init(&table->shard, key_hash, key_equal, NULL);
init_rwsem(&table->lock);
- table->nr = 0;
table->last_match = NULL;
}
}
@@ -32,6 +37,7 @@ void threads__exit(struct threads *threads)
for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
struct threads_table_entry *table = &threads->table[i];

+ hashmap__clear(&table->shard);
exit_rwsem(&table->lock);
}
}
@@ -44,7 +50,7 @@ size_t threads__nr(struct threads *threads)
struct threads_table_entry *table = &threads->table[i];

down_read(&table->lock);
- nr += table->nr;
+ nr += hashmap__size(&table->shard);
up_read(&table->lock);
}
return nr;
@@ -86,28 +92,13 @@ static void threads_table_entry__set_last_match(struct threads_table_entry *tabl
struct thread *threads__find(struct threads *threads, pid_t tid)
{
struct threads_table_entry *table = threads__table(threads, tid);
- struct rb_node **p;
- struct thread *res = NULL;
+ struct thread *res;

down_read(&table->lock);
res = __threads_table_entry__get_last_match(table, tid);
- if (res)
- return res;
-
- p = &table->entries.rb_root.rb_node;
- while (*p != NULL) {
- struct rb_node *parent = *p;
- struct thread *th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
-
- if (thread__tid(th) == tid) {
- res = thread__get(th);
- break;
- }
-
- if (tid < thread__tid(th))
- p = &(*p)->rb_left;
- else
- p = &(*p)->rb_right;
+ if (!res) {
+ if (hashmap__find(&table->shard, tid, &res))
+ res = thread__get(res);
}
up_read(&table->lock);
if (res)
@@ -118,49 +109,25 @@ struct thread *threads__find(struct threads *threads, pid_t tid)
struct thread *threads__findnew(struct threads *threads, pid_t pid, pid_t tid, bool *created)
{
struct threads_table_entry *table = threads__table(threads, tid);
- struct rb_node **p;
- struct rb_node *parent = NULL;
struct thread *res = NULL;
- struct thread_rb_node *nd;
- bool leftmost = true;

*created = false;
down_write(&table->lock);
- p = &table->entries.rb_root.rb_node;
- while (*p != NULL) {
- struct thread *th;
-
- parent = *p;
- th = rb_entry(parent, struct thread_rb_node, rb_node)->thread;
-
- if (thread__tid(th) == tid) {
- __threads_table_entry__set_last_match(table, th);
- res = thread__get(th);
- goto out_unlock;
- }
-
- if (tid < thread__tid(th))
- p = &(*p)->rb_left;
- else {
- leftmost = false;
- p = &(*p)->rb_right;
- }
- }
- nd = malloc(sizeof(*nd));
- if (nd == NULL)
- goto out_unlock;
res = thread__new(pid, tid);
- if (!res)
- free(nd);
- else {
- *created = true;
- nd->thread = thread__get(res);
- rb_link_node(&nd->rb_node, parent, p);
- rb_insert_color_cached(&nd->rb_node, &table->entries, leftmost);
- ++table->nr;
- __threads_table_entry__set_last_match(table, res);
+ if (res) {
+ if (hashmap__add(&table->shard, tid, res)) {
+ /* Add failed. Assume a race so find other entry. */
+ thread__put(res);
+ res = NULL;
+ if (hashmap__find(&table->shard, tid, &res))
+ res = thread__get(res);
+ } else {
+ res = thread__get(res);
+ *created = true;
+ }
+ if (res)
+ __threads_table_entry__set_last_match(table, res);
}
-out_unlock:
up_write(&table->lock);
return res;
}
@@ -169,57 +136,32 @@ void threads__remove_all_threads(struct threads *threads)
{
for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
struct threads_table_entry *table = &threads->table[i];
- struct rb_node *nd;
+ struct hashmap_entry *cur, *tmp;
+ size_t bkt;

down_write(&table->lock);
__threads_table_entry__set_last_match(table, NULL);
- nd = rb_first_cached(&table->entries);
- while (nd) {
- struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
-
- nd = rb_next(nd);
- thread__put(trb->thread);
- rb_erase_cached(&trb->rb_node, &table->entries);
- RB_CLEAR_NODE(&trb->rb_node);
- --table->nr;
+ hashmap__for_each_entry_safe((&table->shard), cur, tmp, bkt) {
+ struct thread *old_value;

- free(trb);
+ hashmap__delete(&table->shard, cur->key, /*old_key=*/NULL, &old_value);
+ thread__put(old_value);
}
- assert(table->nr == 0);
up_write(&table->lock);
}
}

void threads__remove(struct threads *threads, struct thread *thread)
{
- struct rb_node **p;
struct threads_table_entry *table = threads__table(threads, thread__tid(thread));
- pid_t tid = thread__tid(thread);
+ struct thread *old_value;

down_write(&table->lock);
if (table->last_match && RC_CHK_EQUAL(table->last_match, thread))
__threads_table_entry__set_last_match(table, NULL);

- p = &table->entries.rb_root.rb_node;
- while (*p != NULL) {
- struct rb_node *parent = *p;
- struct thread_rb_node *nd = rb_entry(parent, struct thread_rb_node, rb_node);
- struct thread *th = nd->thread;
-
- if (RC_CHK_EQUAL(th, thread)) {
- thread__put(nd->thread);
- rb_erase_cached(&nd->rb_node, &table->entries);
- RB_CLEAR_NODE(&nd->rb_node);
- --table->nr;
- free(nd);
- break;
- }
-
- if (tid < thread__tid(th))
- p = &(*p)->rb_left;
- else
- p = &(*p)->rb_right;
- }
+ hashmap__delete(&table->shard, thread__tid(thread), /*old_key=*/NULL, &old_value);
+ thread__put(old_value);
up_write(&table->lock);
}

@@ -229,11 +171,11 @@ int threads__for_each_thread(struct threads *threads,
{
for (int i = 0; i < THREADS__TABLE_SIZE; i++) {
struct threads_table_entry *table = &threads->table[i];
- struct rb_node *nd;
+ struct hashmap_entry *cur;
+ size_t bkt;

- for (nd = rb_first_cached(&table->entries); nd; nd = rb_next(nd)) {
- struct thread_rb_node *trb = rb_entry(nd, struct thread_rb_node, rb_node);
- int rc = fn(trb->thread, data);
+ hashmap__for_each_entry((&table->shard), cur, bkt) {
+ int rc = fn((struct thread *)cur->pvalue, data);

if (rc != 0)
return rc;
diff --git a/tools/perf/util/threads.h b/tools/perf/util/threads.h
index ed67de627578..d03bd91a7769 100644
--- a/tools/perf/util/threads.h
+++ b/tools/perf/util/threads.h
@@ -2,7 +2,7 @@
#ifndef __PERF_THREADS_H
#define __PERF_THREADS_H

-#include <linux/rbtree.h>
+#include "hashmap.h"
#include "rwsem.h"

struct thread;
@@ -11,9 +11,9 @@ struct thread;
#define THREADS__TABLE_SIZE (1 << THREADS__TABLE_BITS)

struct threads_table_entry {
- struct rb_root_cached entries;
+ /* Key is tid, value is struct thread. */
+ struct hashmap shard;
struct rw_semaphore lock;
- unsigned int nr;
struct thread *last_match;
};

--
2.42.0.758.gaed0368e0e-goog

2023-10-24 22:38:42

by Ian Rogers

[permalink] [raw]
Subject: [PATCH v3 26/50] perf maps: Move symbol maps functions to maps.c

Move the find and certain other symbol maps__* functions to maps.c for
better abstraction.

Signed-off-by: Ian Rogers <[email protected]>
---
tools/perf/util/maps.c | 238 +++++++++++++++++++++++++++++++++++++
tools/perf/util/maps.h | 12 ++
tools/perf/util/symbol.c | 248 ---------------------------------------
tools/perf/util/symbol.h | 1 -
4 files changed, 250 insertions(+), 249 deletions(-)

diff --git a/tools/perf/util/maps.c b/tools/perf/util/maps.c
index 233438c95b53..9a011aed4b75 100644
--- a/tools/perf/util/maps.c
+++ b/tools/perf/util/maps.c
@@ -475,3 +475,241 @@ struct map_rb_node *map_rb_node__next(struct map_rb_node *node)

return rb_entry(next, struct map_rb_node, rb_node);
}
+
+static int map__strcmp(const void *a, const void *b)
+{
+ const struct map *map_a = *(const struct map **)a;
+ const struct map *map_b = *(const struct map **)b;
+ const struct dso *dso_a = map__dso(map_a);
+ const struct dso *dso_b = map__dso(map_b);
+ int ret = strcmp(dso_a->short_name, dso_b->short_name);
+
+ if (ret == 0 && map_a != map_b) {
+ /*
+ * Ensure distinct but name equal maps have an order in part to
+ * aid reference counting.
+ */
+ ret = (int)map__start(map_a) - (int)map__start(map_b);
+ if (ret == 0)
+ ret = (int)((intptr_t)map_a - (intptr_t)map_b);
+ }
+
+ return ret;
+}
+
+static int map__strcmp_name(const void *name, const void *b)
+{
+ const struct dso *dso = map__dso(*(const struct map **)b);
+
+ return strcmp(name, dso->short_name);
+}
+
+void __maps__sort_by_name(struct maps *maps)
+{
+ qsort(maps__maps_by_name(maps), maps__nr_maps(maps), sizeof(struct map *), map__strcmp);
+}
+
+static int map__groups__sort_by_name_from_rbtree(struct maps *maps)
+{
+ struct map_rb_node *rb_node;
+ struct map **maps_by_name = realloc(maps__maps_by_name(maps),
+ maps__nr_maps(maps) * sizeof(struct map *));
+ int i = 0;
+
+ if (maps_by_name == NULL)
+ return -1;
+
+ up_read(maps__lock(maps));
+ down_write(maps__lock(maps));
+
+ RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
+ RC_CHK_ACCESS(maps)->nr_maps_allocated = maps__nr_maps(maps);
+
+ maps__for_each_entry(maps, rb_node)
+ maps_by_name[i++] = map__get(rb_node->map);
+
+ __maps__sort_by_name(maps);
+
+ up_write(maps__lock(maps));
+ down_read(maps__lock(maps));
+
+ return 0;
+}
+
+static struct map *__maps__find_by_name(struct maps *maps, const char *name)
+{
+ struct map **mapp;
+
+ if (maps__maps_by_name(maps) == NULL &&
+ map__groups__sort_by_name_from_rbtree(maps))
+ return NULL;
+
+ mapp = bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
+ sizeof(*mapp), map__strcmp_name);
+ if (mapp)
+ return *mapp;
+ return NULL;
+}
+
+struct map *maps__find_by_name(struct maps *maps, const char *name)
+{
+ struct map_rb_node *rb_node;
+ struct map *map;
+
+ down_read(maps__lock(maps));
+
+
+ if (RC_CHK_ACCESS(maps)->last_search_by_name) {
+ const struct dso *dso = map__dso(RC_CHK_ACCESS(maps)->last_search_by_name);
+
+ if (strcmp(dso->short_name, name) == 0) {
+ map = RC_CHK_ACCESS(maps)->last_search_by_name;
+ goto out_unlock;
+ }
+ }
+ /*
+ * If we have maps->maps_by_name, then the name isn't in the rbtree,
+ * as maps->maps_by_name mirrors the rbtree when lookups by name are
+ * made.
+ */
+ map = __maps__find_by_name(maps, name);
+ if (map || maps__maps_by_name(maps) != NULL)
+ goto out_unlock;
+
+ /* Fallback to traversing the rbtree... */
+ maps__for_each_entry(maps, rb_node) {
+ struct dso *dso;
+
+ map = rb_node->map;
+ dso = map__dso(map);
+ if (strcmp(dso->short_name, name) == 0) {
+ RC_CHK_ACCESS(maps)->last_search_by_name = map;
+ goto out_unlock;
+ }
+ }
+ map = NULL;
+
+out_unlock:
+ up_read(maps__lock(maps));
+ return map;
+}
+
+void maps__fixup_end(struct maps *maps)
+{
+ struct map_rb_node *prev = NULL, *curr;
+
+ down_write(maps__lock(maps));
+
+ maps__for_each_entry(maps, curr) {
+ if (prev != NULL && !map__end(prev->map))
+ map__set_end(prev->map, map__start(curr->map));
+
+ prev = curr;
+ }
+
+ /*
+ * We still haven't the actual symbols, so guess the
+ * last map final address.
+ */
+ if (curr && !map__end(curr->map))
+ map__set_end(curr->map, ~0ULL);
+
+ up_write(maps__lock(maps));
+}
+
+/*
+ * Merges map into maps by splitting the new map within the existing map
+ * regions.
+ */
+int maps__merge_in(struct maps *kmaps, struct map *new_map)
+{
+ struct map_rb_node *rb_node;
+ LIST_HEAD(merged);
+ int err = 0;
+
+ maps__for_each_entry(kmaps, rb_node) {
+ struct map *old_map = rb_node->map;
+
+ /* no overload with this one */
+ if (map__end(new_map) < map__start(old_map) ||
+ map__start(new_map) >= map__end(old_map))
+ continue;
+
+ if (map__start(new_map) < map__start(old_map)) {
+ /*
+ * |new......
+ * |old....
+ */
+ if (map__end(new_map) < map__end(old_map)) {
+ /*
+ * |new......| -> |new..|
+ * |old....| -> |old....|
+ */
+ map__set_end(new_map, map__start(old_map));
+ } else {
+ /*
+ * |new.............| -> |new..| |new..|
+ * |old....| -> |old....|
+ */
+ struct map_list_node *m = map_list_node__new();
+
+ if (!m) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ m->map = map__clone(new_map);
+ if (!m->map) {
+ free(m);
+ err = -ENOMEM;
+ goto out;
+ }
+
+ map__set_end(m->map, map__start(old_map));
+ list_add_tail(&m->node, &merged);
+ map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
+ map__set_start(new_map, map__end(old_map));
+ }
+ } else {
+ /*
+ * |new......
+ * |old....
+ */
+ if (map__end(new_map) < map__end(old_map)) {
+ /*
+ * |new..| -> x
+ * |old.........| -> |old.........|
+ */
+ map__put(new_map);
+ new_map = NULL;
+ break;
+ } else {
+ /*
+ * |new......| -> |new...|
+ * |old....| -> |old....|
+ */
+ map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
+ map__set_start(new_map, map__end(old_map));
+ }
+ }
+ }
+
+out:
+ while (!list_empty(&merged)) {
+ struct map_list_node *old_node;
+
+ old_node = list_entry(merged.next, struct map_list_node, node);
+ list_del_init(&old_node->node);
+ if (!err)
+ err = maps__insert(kmaps, old_node->map);
+ map__put(old_node->map);
+ free(old_node);
+ }
+
+ if (new_map) {
+ if (!err)
+ err = maps__insert(kmaps, new_map);
+ map__put(new_map);
+ }
+ return err;
+}
diff --git a/tools/perf/util/maps.h b/tools/perf/util/maps.h
index 83144e0645ed..a689149be8c4 100644
--- a/tools/perf/util/maps.h
+++ b/tools/perf/util/maps.h
@@ -21,6 +21,16 @@ struct map_rb_node {
struct map *map;
};

+struct map_list_node {
+ struct list_head node;
+ struct map *map;
+};
+
+static inline struct map_list_node *map_list_node__new(void)
+{
+ return malloc(sizeof(struct map_list_node));
+}
+
struct map_rb_node *maps__first(struct maps *maps);
struct map_rb_node *map_rb_node__next(struct map_rb_node *node);
struct map_rb_node *maps__find_node(struct maps *maps, struct map *map);
@@ -133,4 +143,6 @@ int maps__merge_in(struct maps *kmaps, struct map *new_map);

void __maps__sort_by_name(struct maps *maps);

+void maps__fixup_end(struct maps *maps);
+
#endif // __PERF_MAPS_H
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 314c0263bf3c..1cc42b8d8afb 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -48,11 +48,6 @@ static bool symbol__is_idle(const char *name);
int vmlinux_path__nr_entries;
char **vmlinux_path;

-struct map_list_node {
- struct list_head node;
- struct map *map;
-};
-
struct symbol_conf symbol_conf = {
.nanosecs = false,
.use_modules = true,
@@ -90,11 +85,6 @@ static enum dso_binary_type binary_type_symtab[] = {

#define DSO_BINARY_TYPE__SYMTAB_CNT ARRAY_SIZE(binary_type_symtab)

-static struct map_list_node *map_list_node__new(void)
-{
- return malloc(sizeof(struct map_list_node));
-}
-
static bool symbol_type__filter(char symbol_type)
{
symbol_type = toupper(symbol_type);
@@ -270,29 +260,6 @@ void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms)
curr->end = roundup(curr->start, 4096) + 4096;
}

-void maps__fixup_end(struct maps *maps)
-{
- struct map_rb_node *prev = NULL, *curr;
-
- down_write(maps__lock(maps));
-
- maps__for_each_entry(maps, curr) {
- if (prev != NULL && !map__end(prev->map))
- map__set_end(prev->map, map__start(curr->map));
-
- prev = curr;
- }
-
- /*
- * We still haven't the actual symbols, so guess the
- * last map final address.
- */
- if (curr && !map__end(curr->map))
- map__set_end(curr->map, ~0ULL);
-
- up_write(maps__lock(maps));
-}
-
struct symbol *symbol__new(u64 start, u64 len, u8 binding, u8 type, const char *name)
{
size_t namelen = strlen(name) + 1;
@@ -1270,103 +1237,6 @@ static int kcore_mapfn(u64 start, u64 len, u64 pgoff, void *data)
return 0;
}

-/*
- * Merges map into maps by splitting the new map within the existing map
- * regions.
- */
-int maps__merge_in(struct maps *kmaps, struct map *new_map)
-{
- struct map_rb_node *rb_node;
- LIST_HEAD(merged);
- int err = 0;
-
- maps__for_each_entry(kmaps, rb_node) {
- struct map *old_map = rb_node->map;
-
- /* no overload with this one */
- if (map__end(new_map) < map__start(old_map) ||
- map__start(new_map) >= map__end(old_map))
- continue;
-
- if (map__start(new_map) < map__start(old_map)) {
- /*
- * |new......
- * |old....
- */
- if (map__end(new_map) < map__end(old_map)) {
- /*
- * |new......| -> |new..|
- * |old....| -> |old....|
- */
- map__set_end(new_map, map__start(old_map));
- } else {
- /*
- * |new.............| -> |new..| |new..|
- * |old....| -> |old....|
- */
- struct map_list_node *m = map_list_node__new();
-
- if (!m) {
- err = -ENOMEM;
- goto out;
- }
-
- m->map = map__clone(new_map);
- if (!m->map) {
- free(m);
- err = -ENOMEM;
- goto out;
- }
-
- map__set_end(m->map, map__start(old_map));
- list_add_tail(&m->node, &merged);
- map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
- map__set_start(new_map, map__end(old_map));
- }
- } else {
- /*
- * |new......
- * |old....
- */
- if (map__end(new_map) < map__end(old_map)) {
- /*
- * |new..| -> x
- * |old.........| -> |old.........|
- */
- map__put(new_map);
- new_map = NULL;
- break;
- } else {
- /*
- * |new......| -> |new...|
- * |old....| -> |old....|
- */
- map__add_pgoff(new_map, map__end(old_map) - map__start(new_map));
- map__set_start(new_map, map__end(old_map));
- }
- }
- }
-
-out:
- while (!list_empty(&merged)) {
- struct map_list_node *old_node;
-
- old_node = list_entry(merged.next, struct map_list_node, node);
- list_del_init(&old_node->node);
- if (!err)
- err = maps__insert(kmaps, old_node->map);
- map__put(old_node->map);
- free(old_node);
- }
-
- if (new_map) {
- if (!err)
- err = maps__insert(kmaps, new_map);
- map__put(new_map);
- }
- return err;
-}
-
static int dso__load_kcore(struct dso *dso, struct map *map,
const char *kallsyms_filename)
{
@@ -2065,124 +1935,6 @@ int dso__load(struct dso *dso, struct map *map)
return ret;
}

-static int map__strcmp(const void *a, const void *b)
-{
- const struct map *map_a = *(const struct map **)a;
- const struct map *map_b = *(const struct map **)b;
- const struct dso *dso_a = map__dso(map_a);
- const struct dso *dso_b = map__dso(map_b);
- int ret = strcmp(dso_a->short_name, dso_b->short_name);
-
- if (ret == 0 && map_a != map_b) {
- /*
- * Ensure distinct but name equal maps have an order in part to
- * aid reference counting.
- */
- ret = (int)map__start(map_a) - (int)map__start(map_b);
- if (ret == 0)
- ret = (int)((intptr_t)map_a - (intptr_t)map_b);
- }
-
- return ret;
-}
-
-static int map__strcmp_name(const void *name, const void *b)
-{
- const struct dso *dso = map__dso(*(const struct map **)b);
-
- return strcmp(name, dso->short_name);
-}
-
-void __maps__sort_by_name(struct maps *maps)
-{
- qsort(maps__maps_by_name(maps), maps__nr_maps(maps), sizeof(struct map *), map__strcmp);
-}
-
-static int map__groups__sort_by_name_from_rbtree(struct maps *maps)
-{
- struct map_rb_node *rb_node;
- struct map **maps_by_name = realloc(maps__maps_by_name(maps),
- maps__nr_maps(maps) * sizeof(struct map *));
- int i = 0;
-
- if (maps_by_name == NULL)
- return -1;
-
- up_read(maps__lock(maps));
- down_write(maps__lock(maps));
-
- RC_CHK_ACCESS(maps)->maps_by_name = maps_by_name;
- RC_CHK_ACCESS(maps)->nr_maps_allocated = maps__nr_maps(maps);
-
- maps__for_each_entry(maps, rb_node)
- maps_by_name[i++] = map__get(rb_node->map);
-
- __maps__sort_by_name(maps);
-
- up_write(maps__lock(maps));
- down_read(maps__lock(maps));
-
- return 0;
-}
-
-static struct map *__maps__find_by_name(struct maps *maps, const char *name)
-{
- struct map **mapp;
-
- if (maps__maps_by_name(maps) == NULL &&
- map__groups__sort_by_name_from_rbtree(maps))
- return NULL;
-
- mapp = bsearch(name, maps__maps_by_name(maps), maps__nr_maps(maps),
- sizeof(*mapp), map__strcmp_name);
- if (mapp)
- return *mapp;
- return NULL;
-}
-
-struct map *maps__find_by_name(struct maps *maps, const char *name)
-{
- struct map_rb_node *rb_node;
- struct map *map;
-
- down_read(maps__lock(maps));
-
-
- if (RC_CHK_ACCESS(maps)->last_search_by_name) {
- const struct dso *dso = map__dso(RC_CHK_ACCESS(maps)->last_search_by_name);
-
- if (strcmp(dso->short_name, name) == 0) {
- map = RC_CHK_ACCESS(maps)->last_search_by_name;
- goto out_unlock;
- }
- }
- /*
- * If we have maps->maps_by_name, then the name isn't in the rbtree,
- * as maps->maps_by_name mirrors the rbtree when lookups by name are
- * made.
- */
- map = __maps__find_by_name(maps, name);
- if (map || maps__maps_by_name(maps) != NULL)
- goto out_unlock;
-
- /* Fallback to traversing the rbtree... */
- maps__for_each_entry(maps, rb_node) {
- struct dso *dso;
-
- map = rb_node->map;
- dso = map__dso(map);
- if (strcmp(dso->short_name, name) == 0) {
- RC_CHK_ACCESS(maps)->last_search_by_name = map;
- goto out_unlock;
- }
- }
- map = NULL;
-
-out_unlock:
- up_read(maps__lock(maps));
- return map;
-}
-
int dso__load_vmlinux(struct dso *dso, struct map *map,
const char *vmlinux, bool vmlinux_allocated)
{
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index af87c46b3f89..071837ddce2a 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -189,7 +189,6 @@ void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
void symbols__fixup_duplicate(struct rb_root_cached *symbols);
void symbols__fixup_end(struct rb_root_cached *symbols, bool is_kallsyms);
-void maps__fixup_end(struct maps *maps);

typedef int (*mapfn_t)(u64 start, u64 len, u64 pgoff, void *data);
int file__read_maps(int fd, bool exe, mapfn_t mapfn, void *data,
--
2.42.0.758.gaed0368e0e-goog

2023-10-25 02:39:08

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v3 13/50] libperf: Lazily allocate mmap event copy

Hello,

On 2023/10/25 6:23, Ian Rogers wrote:
> The event copy in the mmap is used to have storage to a read
> event. Not all users of mmaps read the events, such as perf record, so
> switch the allocation to being on first read rather than being
> embedded within the perf_mmap.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/lib/perf/include/internal/mmap.h | 2 +-
> tools/lib/perf/mmap.c | 9 +++++++++
> 2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
> index 5a062af8e9d8..b11aaf5ed645 100644
> --- a/tools/lib/perf/include/internal/mmap.h
> +++ b/tools/lib/perf/include/internal/mmap.h
> @@ -33,7 +33,7 @@ struct perf_mmap {
> bool overwrite;
> u64 flush;
> libperf_unmap_cb_t unmap_cb;
> - char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> + void *event_copy;
> struct perf_mmap *next;
> };
>
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index 2184814b37dd..91ae46aac378 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
>
> void perf_mmap__munmap(struct perf_mmap *map)
> {
> + free(map->event_copy);
> + map->event_copy = NULL;
> if (map && map->base != NULL) {
> munmap(map->base, perf_mmap__mmap_len(map));
> map->base = NULL;
> @@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> unsigned int len = min(sizeof(*event), size), cpy;
> void *dst = map->event_copy;
>
> + if (!dst) {
> + dst = malloc(PERF_SAMPLE_MAX_SIZE);
`event` is used as the return parameter of libperf API
perf_mmap__read_event().
Is `zalloc` better to avoid dirty data?

Thanks,
Yang

2023-10-25 03:28:28

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v3 13/50] libperf: Lazily allocate mmap event copy

On Tue, Oct 24, 2023 at 7:39 PM Yang Jihong <[email protected]> wrote:
>
> Hello,
>
> On 2023/10/25 6:23, Ian Rogers wrote:
> > The event copy in the mmap is used to have storage to a read
> > event. Not all users of mmaps read the events, such as perf record, so
> > switch the allocation to being on first read rather than being
> > embedded within the perf_mmap.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/lib/perf/include/internal/mmap.h | 2 +-
> > tools/lib/perf/mmap.c | 9 +++++++++
> > 2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/lib/perf/include/internal/mmap.h b/tools/lib/perf/include/internal/mmap.h
> > index 5a062af8e9d8..b11aaf5ed645 100644
> > --- a/tools/lib/perf/include/internal/mmap.h
> > +++ b/tools/lib/perf/include/internal/mmap.h
> > @@ -33,7 +33,7 @@ struct perf_mmap {
> > bool overwrite;
> > u64 flush;
> > libperf_unmap_cb_t unmap_cb;
> > - char event_copy[PERF_SAMPLE_MAX_SIZE] __aligned(8);
> > + void *event_copy;
> > struct perf_mmap *next;
> > };
> >
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index 2184814b37dd..91ae46aac378 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -51,6 +51,8 @@ int perf_mmap__mmap(struct perf_mmap *map, struct perf_mmap_param *mp,
> >
> > void perf_mmap__munmap(struct perf_mmap *map)
> > {
> > + free(map->event_copy);
> > + map->event_copy = NULL;
> > if (map && map->base != NULL) {
> > munmap(map->base, perf_mmap__mmap_len(map));
> > map->base = NULL;
> > @@ -226,6 +228,13 @@ static union perf_event *perf_mmap__read(struct perf_mmap *map,
> > unsigned int len = min(sizeof(*event), size), cpy;
> > void *dst = map->event_copy;
> >
> > + if (!dst) {
> > + dst = malloc(PERF_SAMPLE_MAX_SIZE);
> `event` is used as the return parameter of libperf API
> perf_mmap__read_event().
> Is `zalloc` better to avoid dirty data?

We could but I'd prefer not to zero unnecessarily. We've had similar
uninitialized memory being copied to perf.data file problems such as:
http://lore.kernel.org/lkml/[email protected]
With the sanitizers help we can avoid the problem and do less work,
which I think is best.

Thanks,
Ian

> Thanks,
> Yang

2023-10-25 03:45:02

by Yang Jihong

[permalink] [raw]
Subject: Re: [PATCH v3 20/50] perf record: Be lazier in allocating lost samples buffer

Hello,

On 2023/10/25 6:23, Ian Rogers wrote:
> Wait until a lost sample occurs to allocate the lost samples buffer,
> often the buffer isn't necessary. This saves a 64kb allocation and
> 5.3kb of peak memory consumption.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
> 1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 9b4f3805ca92..b6c8c1371b39 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
> static void record__read_lost_samples(struct record *rec)
> {
> struct perf_session *session = rec->session;
> - struct perf_record_lost_samples *lost;
> + struct perf_record_lost_samples *lost = NULL;
> struct evsel *evsel;
>
> /* there was an error during record__open */
> if (session->evlist == NULL)
> return;
>
> - lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> - if (lost == NULL) {
> - pr_debug("Memory allocation failed\n");
> - return;
> - }
> -
> - lost->header.type = PERF_RECORD_LOST_SAMPLES;
> -
> evlist__for_each_entry(session->evlist, evsel) {
> struct xyarray *xy = evsel->core.sample_id;
> u64 lost_count;
> @@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
> }
>
> if (count.lost) {
> + if (!lost) {
> + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> + if (!lost) {
> + pr_debug("Memory allocation failed\n");
> + return;
> + }
> + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> + }
> __record__save_lost_samples(rec, evsel, lost,
> x, y, count.lost, 0);
> }
> @@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
> }
>
> lost_count = perf_bpf_filter__lost_count(evsel);
> - if (lost_count)
> + if (lost_count) {
> + if (!lost) {
> + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> + if (!lost) {
> + pr_debug("Memory allocation failed\n");
> + return;
> + }
> + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> + }
> __record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
> PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> + }
> }

Can zalloc for `lost` be moved to __record__save_lost_samples?
This simplifies the code.


diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index dcf288a4fb9a..8d2eb746031a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1888,14 +1888,25 @@ record__switch_output(struct record *rec, bool
at_exit)
}

static void __record__save_lost_samples(struct record *rec, struct
evsel *evsel,
- struct perf_record_lost_samples
*lost,
+ struct perf_record_lost_samples
**plost,
int cpu_idx, int thread_idx,
u64 lost_count,
u16 misc_flag)
{
struct perf_sample_id *sid;
struct perf_sample sample = {};
+ struct perf_record_lost_samples *lost = *plost;
int id_hdr_size;

+ if (!lost) {
+ lost = zalloc(PERF_SAMPLE_MAX_SIZE);
+ if (!lost) {
+ pr_debug("Memory allocation failed\n");
+ return;
+ }
+ lost->header.type = PERF_RECORD_LOST_SAMPLES;
+ *plost = lost;
+ }
+
lost->lost = lost_count;
if (evsel->core.ids) {
sid = xyarray__entry(evsel->core.sample_id, cpu_idx,
thread_idx);
@@ -1912,21 +1923,13 @@ static void __record__save_lost_samples(struct
record *rec, struct evsel *evsel,
static void record__read_lost_samples(struct record *rec)
{
struct perf_session *session = rec->session;
- struct perf_record_lost_samples *lost;
+ struct perf_record_lost_samples *lost = NULL;
struct evsel *evsel;

/* there was an error during record__open */
if (session->evlist == NULL)
return;

- lost = zalloc(PERF_SAMPLE_MAX_SIZE);
- if (lost == NULL) {
- pr_debug("Memory allocation failed\n");
- return;
- }
-
- lost->header.type = PERF_RECORD_LOST_SAMPLES;
-
evlist__for_each_entry(session->evlist, evsel) {
struct xyarray *xy = evsel->core.sample_id;
u64 lost_count;
@@ -1949,7 +1952,7 @@ static void record__read_lost_samples(struct
record *rec)
}

if (count.lost) {
- __record__save_lost_samples(rec,
evsel, lost,
+ __record__save_lost_samples(rec,
evsel, &lost,
x,
y, count.lost, 0);
}
}
@@ -1957,11 +1960,12 @@ static void record__read_lost_samples(struct
record *rec)

lost_count = perf_bpf_filter__lost_count(evsel);
if (lost_count)
- __record__save_lost_samples(rec, evsel, lost, 0,
0, lost_count,
+ __record__save_lost_samples(rec, evsel, &lost,
0, 0, lost_count,

PERF_RECORD_MISC_LOST_SAMPLES_BPF);
}
out:
- free(lost);
+ if (lost)
+ free(lost);
}

static volatile sig_atomic_t workload_exec_errno;


Thanks,
Yang

2023-10-25 17:01:38

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v3 20/50] perf record: Be lazier in allocating lost samples buffer

On Tue, Oct 24, 2023 at 8:44 PM Yang Jihong <[email protected]> wrote:
>
> Hello,
>
> On 2023/10/25 6:23, Ian Rogers wrote:
> > Wait until a lost sample occurs to allocate the lost samples buffer,
> > often the buffer isn't necessary. This saves a 64kb allocation and
> > 5.3kb of peak memory consumption.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
> > 1 file changed, 19 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 9b4f3805ca92..b6c8c1371b39 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
> > static void record__read_lost_samples(struct record *rec)
> > {
> > struct perf_session *session = rec->session;
> > - struct perf_record_lost_samples *lost;
> > + struct perf_record_lost_samples *lost = NULL;
> > struct evsel *evsel;
> >
> > /* there was an error during record__open */
> > if (session->evlist == NULL)
> > return;
> >
> > - lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > - if (lost == NULL) {
> > - pr_debug("Memory allocation failed\n");
> > - return;
> > - }
> > -
> > - lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > -
> > evlist__for_each_entry(session->evlist, evsel) {
> > struct xyarray *xy = evsel->core.sample_id;
> > u64 lost_count;
> > @@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
> > }
> >
> > if (count.lost) {
> > + if (!lost) {
> > + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > + if (!lost) {
> > + pr_debug("Memory allocation failed\n");
> > + return;
> > + }
> > + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > + }
> > __record__save_lost_samples(rec, evsel, lost,
> > x, y, count.lost, 0);
> > }
> > @@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
> > }
> >
> > lost_count = perf_bpf_filter__lost_count(evsel);
> > - if (lost_count)
> > + if (lost_count) {
> > + if (!lost) {
> > + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > + if (!lost) {
> > + pr_debug("Memory allocation failed\n");
> > + return;
> > + }
> > + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > + }
> > __record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
> > PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> > + }
> > }
>
> Can zalloc for `lost` be moved to __record__save_lost_samples?
> This simplifies the code.

Hmm.. seems marginal. This change makes the code not return in
record__read_lost_samples when the memory allocation fails, so I'm
50/50 on it.

Thanks,
Ian

>
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index dcf288a4fb9a..8d2eb746031a 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1888,14 +1888,25 @@ record__switch_output(struct record *rec, bool
> at_exit)
> }
>
> static void __record__save_lost_samples(struct record *rec, struct
> evsel *evsel,
> - struct perf_record_lost_samples
> *lost,
> + struct perf_record_lost_samples
> **plost,
> int cpu_idx, int thread_idx,
> u64 lost_count,
> u16 misc_flag)
> {
> struct perf_sample_id *sid;
> struct perf_sample sample = {};
> + struct perf_record_lost_samples *lost = *plost;
> int id_hdr_size;
>
> + if (!lost) {
> + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> + if (!lost) {
> + pr_debug("Memory allocation failed\n");
> + return;
> + }
> + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> + *plost = lost;
> + }
> +
> lost->lost = lost_count;
> if (evsel->core.ids) {
> sid = xyarray__entry(evsel->core.sample_id, cpu_idx,
> thread_idx);
> @@ -1912,21 +1923,13 @@ static void __record__save_lost_samples(struct
> record *rec, struct evsel *evsel,
> static void record__read_lost_samples(struct record *rec)
> {
> struct perf_session *session = rec->session;
> - struct perf_record_lost_samples *lost;
> + struct perf_record_lost_samples *lost = NULL;
> struct evsel *evsel;
>
> /* there was an error during record__open */
> if (session->evlist == NULL)
> return;
>
> - lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> - if (lost == NULL) {
> - pr_debug("Memory allocation failed\n");
> - return;
> - }
> -
> - lost->header.type = PERF_RECORD_LOST_SAMPLES;
> -
> evlist__for_each_entry(session->evlist, evsel) {
> struct xyarray *xy = evsel->core.sample_id;
> u64 lost_count;
> @@ -1949,7 +1952,7 @@ static void record__read_lost_samples(struct
> record *rec)
> }
>
> if (count.lost) {
> - __record__save_lost_samples(rec,
> evsel, lost,
> + __record__save_lost_samples(rec,
> evsel, &lost,
> x,
> y, count.lost, 0);
> }
> }
> @@ -1957,11 +1960,12 @@ static void record__read_lost_samples(struct
> record *rec)
>
> lost_count = perf_bpf_filter__lost_count(evsel);
> if (lost_count)
> - __record__save_lost_samples(rec, evsel, lost, 0,
> 0, lost_count,
> + __record__save_lost_samples(rec, evsel, &lost,
> 0, 0, lost_count,
>
> PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> }
> out:
> - free(lost);
> + if (lost)
> + free(lost);
> }
>
> static volatile sig_atomic_t workload_exec_errno;
>
>
> Thanks,
> Yang
>

2023-10-25 18:26:46

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v3 12/50] perf record: Lazy load kernel symbols

Hi Ian,

On Tue, Oct 24, 2023 at 3:24 PM Ian Rogers <[email protected]> wrote:
>
> Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
> changed it so that loading a kernel dso would cause the symbols for
> the dso to be eagerly loaded. For perf record this is overhead as the
> symbols won't be used. Add a symbol_conf to control the behavior and
> disable it for perf record and perf inject.

I'm curious if it can simply move to lazy loading unconditionally.
In most cases, the code calls machine__resolve() which calls
thread__find_map() and map__find_symbol() to load symbols.

So I think it's unnecessary to do it in the thread__find_map().
If it needs a symbol, it should call map__find_symbol() first
and it'll load the symbol table.

Adrian, what's special in inject or Intel-PT on this?

Thanks,
Namhyung


>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/builtin-inject.c | 6 ++++++
> tools/perf/builtin-record.c | 2 ++
> tools/perf/util/event.c | 4 ++--
> tools/perf/util/symbol_conf.h | 3 ++-
> 4 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index c8cf2fdd9cff..eb3ef5c24b66 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -2265,6 +2265,12 @@ int cmd_inject(int argc, const char **argv)
> "perf inject [<options>]",
> NULL
> };
> +
> + if (!inject.itrace_synth_opts.set) {
> + /* Disable eager loading of kernel symbols that adds overhead to perf inject. */
> + symbol_conf.lazy_load_kernel_maps = true;
> + }
> +
> #ifndef HAVE_JITDUMP
> set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
> #endif
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index dcf288a4fb9a..8ec818568662 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
> # undef set_nobuild
> #endif
>
> + /* Disable eager loading of kernel symbols that adds overhead to perf record. */
> + symbol_conf.lazy_load_kernel_maps = true;
> rec->opts.affinity = PERF_AFFINITY_SYS;
>
> rec->evlist = evlist__new();
> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
> index 923c0fb15122..68f45e9e63b6 100644
> --- a/tools/perf/util/event.c
> +++ b/tools/perf/util/event.c
> @@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
> if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
> al->level = 'k';
> maps = machine__kernel_maps(machine);
> - load_map = true;
> + load_map = !symbol_conf.lazy_load_kernel_maps;
> } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
> al->level = '.';
> } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
> al->level = 'g';
> maps = machine__kernel_maps(machine);
> - load_map = true;
> + load_map = !symbol_conf.lazy_load_kernel_maps;
> } else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
> al->level = 'u';
> } else {
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index 0b589570d1d0..2b2fb9e224b0 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -42,7 +42,8 @@ struct symbol_conf {
> inline_name,
> disable_add2line_warn,
> buildid_mmap2,
> - guest_code;
> + guest_code,
> + lazy_load_kernel_maps;
> const char *vmlinux_name,
> *kallsyms_name,
> *source_prefix,
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-25 18:35:28

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH v3 12/50] perf record: Lazy load kernel symbols

On 25/10/23 21:25, Namhyung Kim wrote:
> Hi Ian,
>
> On Tue, Oct 24, 2023 at 3:24 PM Ian Rogers <[email protected]> wrote:
>>
>> Commit 5b7ba82a7591 ("perf symbols: Load kernel maps before using")
>> changed it so that loading a kernel dso would cause the symbols for
>> the dso to be eagerly loaded. For perf record this is overhead as the
>> symbols won't be used. Add a symbol_conf to control the behavior and
>> disable it for perf record and perf inject.
>
> I'm curious if it can simply move to lazy loading unconditionally.
> In most cases, the code calls machine__resolve() which calls
> thread__find_map() and map__find_symbol() to load symbols.
>
> So I think it's unnecessary to do it in the thread__find_map().
> If it needs a symbol, it should call map__find_symbol() first
> and it'll load the symbol table.
>
> Adrian, what's special in inject or Intel-PT on this?

Was a long time ago. Apart from what the commit says below,
I think there might also be other changes to the kernel maps
that happen at loading, but I would have to have a look.

commit 5b7ba82a75915e739709d0ace4bb559cb280db09
Author: Adrian Hunter <[email protected]>
Date: Wed Aug 7 14:38:46 2013 +0300

perf symbols: Load kernel maps before using

In order to use kernel maps to read object code, those maps must be
adjusted to map to the dso file offset. Because lazy-initialization is
used, that is not done until symbols are loaded. However the maps are
first used by thread__find_addr_map() before symbols are loaded. So
this patch changes thread__find_addr() to "load" kernel maps before
using them.



>
> Thanks,
> Namhyung
>
>
>>
>> Signed-off-by: Ian Rogers <[email protected]>
>> ---
>> tools/perf/builtin-inject.c | 6 ++++++
>> tools/perf/builtin-record.c | 2 ++
>> tools/perf/util/event.c | 4 ++--
>> tools/perf/util/symbol_conf.h | 3 ++-
>> 4 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
>> index c8cf2fdd9cff..eb3ef5c24b66 100644
>> --- a/tools/perf/builtin-inject.c
>> +++ b/tools/perf/builtin-inject.c
>> @@ -2265,6 +2265,12 @@ int cmd_inject(int argc, const char **argv)
>> "perf inject [<options>]",
>> NULL
>> };
>> +
>> + if (!inject.itrace_synth_opts.set) {
>> + /* Disable eager loading of kernel symbols that adds overhead to perf inject. */
>> + symbol_conf.lazy_load_kernel_maps = true;
>> + }
>> +
>> #ifndef HAVE_JITDUMP
>> set_option_nobuild(options, 'j', "jit", "NO_LIBELF=1", true);
>> #endif
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index dcf288a4fb9a..8ec818568662 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -3989,6 +3989,8 @@ int cmd_record(int argc, const char **argv)
>> # undef set_nobuild
>> #endif
>>
>> + /* Disable eager loading of kernel symbols that adds overhead to perf record. */
>> + symbol_conf.lazy_load_kernel_maps = true;
>> rec->opts.affinity = PERF_AFFINITY_SYS;
>>
>> rec->evlist = evlist__new();
>> diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
>> index 923c0fb15122..68f45e9e63b6 100644
>> --- a/tools/perf/util/event.c
>> +++ b/tools/perf/util/event.c
>> @@ -617,13 +617,13 @@ struct map *thread__find_map(struct thread *thread, u8 cpumode, u64 addr,
>> if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
>> al->level = 'k';
>> maps = machine__kernel_maps(machine);
>> - load_map = true;
>> + load_map = !symbol_conf.lazy_load_kernel_maps;
>> } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
>> al->level = '.';
>> } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
>> al->level = 'g';
>> maps = machine__kernel_maps(machine);
>> - load_map = true;
>> + load_map = !symbol_conf.lazy_load_kernel_maps;
>> } else if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) {
>> al->level = 'u';
>> } else {
>> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
>> index 0b589570d1d0..2b2fb9e224b0 100644
>> --- a/tools/perf/util/symbol_conf.h
>> +++ b/tools/perf/util/symbol_conf.h
>> @@ -42,7 +42,8 @@ struct symbol_conf {
>> inline_name,
>> disable_add2line_warn,
>> buildid_mmap2,
>> - guest_code;
>> + guest_code,
>> + lazy_load_kernel_maps;
>> const char *vmlinux_name,
>> *kallsyms_name,
>> *source_prefix,
>> --
>> 2.42.0.758.gaed0368e0e-goog
>>

2023-10-25 18:43:48

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v3 18/50] tools lib api: Add io_dir an allocation free readdir alternative

On Tue, Oct 24, 2023 at 3:24 PM Ian Rogers <[email protected]> wrote:
>
> glibc's opendir allocates a minimum of 32kb, when called recursively
> for a directory tree the memory consumption can add up - nearly 300kb
> during perf start-up when processing modules. Add a stack allocated
> variant of readdir sized a little more than 1kb.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/lib/api/Makefile | 2 +-
> tools/lib/api/io_dir.h | 72 ++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 73 insertions(+), 1 deletion(-)
> create mode 100644 tools/lib/api/io_dir.h
>
> diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
> index 044860ac1ed1..186aa407de8c 100644
> --- a/tools/lib/api/Makefile
> +++ b/tools/lib/api/Makefile
> @@ -99,7 +99,7 @@ install_lib: $(LIBFILE)
> $(call do_install_mkdir,$(libdir_SQ)); \
> cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
>
> -HDRS := cpu.h debug.h io.h
> +HDRS := cpu.h debug.h io.h io_dir.h
> FD_HDRS := fd/array.h
> FS_HDRS := fs/fs.h fs/tracing_path.h
> INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
> diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
> new file mode 100644
> index 000000000000..8a70c0718df5
> --- /dev/null
> +++ b/tools/lib/api/io_dir.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/*
> + * Lightweight directory reading library.
> + */
> +#ifndef __API_IO_DIR__
> +#define __API_IO_DIR__
> +
> +#include <dirent.h>
> +#include <fcntl.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <sys/stat.h>
> +
> +struct io_dirent64 {
> + ino64_t d_ino; /* 64-bit inode number */
> + off64_t d_off; /* 64-bit offset to next structure */
> + unsigned short d_reclen; /* Size of this dirent */
> + unsigned char d_type; /* File type */
> + char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
> +};
> +
> +struct io_dir {
> + int dirfd;
> + ssize_t available_bytes;
> + struct io_dirent64 *next;
> + struct io_dirent64 buff[4];
> +};
> +
> +static inline void io_dir__init(struct io_dir *iod, int dirfd)
> +{
> + iod->dirfd = dirfd;
> + iod->available_bytes = 0;
> +}
> +
> +static inline void io_dir__rewinddir(struct io_dir *iod)
> +{
> + lseek(iod->dirfd, 0, SEEK_SET);
> + iod->available_bytes = 0;
> +}
> +
> +static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
> +{
> + struct io_dirent64 *entry;
> +
> + if (iod->available_bytes <= 0) {
> + ssize_t rc = getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
> +
> + if (rc <= 0)
> + return NULL;
> + iod->available_bytes = rc;
> + iod->next = iod->buff;
> + }
> + entry = iod->next;
> + iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);

Any chance of unaligned access?


> + iod->available_bytes -= entry->d_reclen;
> + return entry;
> +}
> +
> +static inline bool io_dir__is_dir(const struct io_dir *iod, const struct io_dirent64 *dent)
> +{
> + if (dent->d_type == DT_UNKNOWN) {
> + struct stat st;
> +
> + if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
> + return false;
> +
> + return S_ISDIR(st.st_mode);

You may want to save the type if it's a DIR.

Thanks,
Namhyung


> + }
> + return dent->d_type == DT_DIR;
> +}
> +
> +#endif
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-25 19:03:07

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v3 20/50] perf record: Be lazier in allocating lost samples buffer

On Tue, Oct 24, 2023 at 3:25 PM Ian Rogers <[email protected]> wrote:
>
> Wait until a lost sample occurs to allocate the lost samples buffer,
> often the buffer isn't necessary. This saves a 64kb allocation and
> 5.3kb of peak memory consumption.
>
> Signed-off-by: Ian Rogers <[email protected]>
> ---
> tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
> 1 file changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 9b4f3805ca92..b6c8c1371b39 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
> static void record__read_lost_samples(struct record *rec)
> {
> struct perf_session *session = rec->session;
> - struct perf_record_lost_samples *lost;
> + struct perf_record_lost_samples *lost = NULL;
> struct evsel *evsel;
>
> /* there was an error during record__open */
> if (session->evlist == NULL)
> return;
>
> - lost = zalloc(PERF_SAMPLE_MAX_SIZE);

Sorry for naively choosing the max size. :-p
It could be sizeof(*lost) + machine->id_hdr_size.

Thanks,
Namhyung


> - if (lost == NULL) {
> - pr_debug("Memory allocation failed\n");
> - return;
> - }
> -
> - lost->header.type = PERF_RECORD_LOST_SAMPLES;
> -
> evlist__for_each_entry(session->evlist, evsel) {
> struct xyarray *xy = evsel->core.sample_id;
> u64 lost_count;
> @@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
> }
>
> if (count.lost) {
> + if (!lost) {
> + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> + if (!lost) {
> + pr_debug("Memory allocation failed\n");
> + return;
> + }
> + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> + }
> __record__save_lost_samples(rec, evsel, lost,
> x, y, count.lost, 0);
> }
> @@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
> }
>
> lost_count = perf_bpf_filter__lost_count(evsel);
> - if (lost_count)
> + if (lost_count) {
> + if (!lost) {
> + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> + if (!lost) {
> + pr_debug("Memory allocation failed\n");
> + return;
> + }
> + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> + }
> __record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
> PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> + }
> }
> out:
> free(lost);
> --
> 2.42.0.758.gaed0368e0e-goog
>

2023-10-25 19:04:45

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v3 20/50] perf record: Be lazier in allocating lost samples buffer

On Wed, Oct 25, 2023 at 10:01 AM Ian Rogers <[email protected]> wrote:
>
> On Tue, Oct 24, 2023 at 8:44 PM Yang Jihong <[email protected]> wrote:
> >
> > Hello,
> >
> > On 2023/10/25 6:23, Ian Rogers wrote:
> > > Wait until a lost sample occurs to allocate the lost samples buffer,
> > > often the buffer isn't necessary. This saves a 64kb allocation and
> > > 5.3kb of peak memory consumption.
> > >
> > > Signed-off-by: Ian Rogers <[email protected]>
> > > ---
> > > tools/perf/builtin-record.c | 29 +++++++++++++++++++----------
> > > 1 file changed, 19 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > index 9b4f3805ca92..b6c8c1371b39 100644
> > > --- a/tools/perf/builtin-record.c
> > > +++ b/tools/perf/builtin-record.c
> > > @@ -1924,21 +1924,13 @@ static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
> > > static void record__read_lost_samples(struct record *rec)
> > > {
> > > struct perf_session *session = rec->session;
> > > - struct perf_record_lost_samples *lost;
> > > + struct perf_record_lost_samples *lost = NULL;
> > > struct evsel *evsel;
> > >
> > > /* there was an error during record__open */
> > > if (session->evlist == NULL)
> > > return;
> > >
> > > - lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > > - if (lost == NULL) {
> > > - pr_debug("Memory allocation failed\n");
> > > - return;
> > > - }
> > > -
> > > - lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > > -
> > > evlist__for_each_entry(session->evlist, evsel) {
> > > struct xyarray *xy = evsel->core.sample_id;
> > > u64 lost_count;
> > > @@ -1961,6 +1953,14 @@ static void record__read_lost_samples(struct record *rec)
> > > }
> > >
> > > if (count.lost) {
> > > + if (!lost) {
> > > + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > > + if (!lost) {
> > > + pr_debug("Memory allocation failed\n");
> > > + return;
> > > + }
> > > + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > > + }
> > > __record__save_lost_samples(rec, evsel, lost,
> > > x, y, count.lost, 0);
> > > }
> > > @@ -1968,9 +1968,18 @@ static void record__read_lost_samples(struct record *rec)
> > > }
> > >
> > > lost_count = perf_bpf_filter__lost_count(evsel);
> > > - if (lost_count)
> > > + if (lost_count) {
> > > + if (!lost) {
> > > + lost = zalloc(PERF_SAMPLE_MAX_SIZE);
> > > + if (!lost) {
> > > + pr_debug("Memory allocation failed\n");
> > > + return;
> > > + }
> > > + lost->header.type = PERF_RECORD_LOST_SAMPLES;
> > > + }
> > > __record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
> > > PERF_RECORD_MISC_LOST_SAMPLES_BPF);
> > > + }
> > > }
> >
> > Can zalloc for `lost` be moved to __record__save_lost_samples?
> > This simplifies the code.
>
> Hmm.. seems marginal. This change makes the code not return in
> record__read_lost_samples when the memory allocation fails, so I'm
> 50/50 on it.

Maybe you can make it return the failure.

> >
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index dcf288a4fb9a..8d2eb746031a 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
[SNIP]
> > out:
> > - free(lost);
> > + if (lost)
> > + free(lost);

This is unnecessary as free() can handle NULL pointers.

Thanks,
Namhyung


> > }
> >
> > static volatile sig_atomic_t workload_exec_errno;
> >
> >
> > Thanks,
> > Yang
> >

2023-10-25 22:16:16

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH v3 18/50] tools lib api: Add io_dir an allocation free readdir alternative

On Wed, Oct 25, 2023 at 11:43 AM Namhyung Kim <[email protected]> wrote:
>
> On Tue, Oct 24, 2023 at 3:24 PM Ian Rogers <[email protected]> wrote:
> >
> > glibc's opendir allocates a minimum of 32kb, when called recursively
> > for a directory tree the memory consumption can add up - nearly 300kb
> > during perf start-up when processing modules. Add a stack allocated
> > variant of readdir sized a little more than 1kb.
> >
> > Signed-off-by: Ian Rogers <[email protected]>
> > ---
> > tools/lib/api/Makefile | 2 +-
> > tools/lib/api/io_dir.h | 72 ++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 73 insertions(+), 1 deletion(-)
> > create mode 100644 tools/lib/api/io_dir.h
> >
> > diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
> > index 044860ac1ed1..186aa407de8c 100644
> > --- a/tools/lib/api/Makefile
> > +++ b/tools/lib/api/Makefile
> > @@ -99,7 +99,7 @@ install_lib: $(LIBFILE)
> > $(call do_install_mkdir,$(libdir_SQ)); \
> > cp -fpR $(LIBFILE) $(DESTDIR)$(libdir_SQ)
> >
> > -HDRS := cpu.h debug.h io.h
> > +HDRS := cpu.h debug.h io.h io_dir.h
> > FD_HDRS := fd/array.h
> > FS_HDRS := fs/fs.h fs/tracing_path.h
> > INSTALL_HDRS_PFX := $(DESTDIR)$(prefix)/include/api
> > diff --git a/tools/lib/api/io_dir.h b/tools/lib/api/io_dir.h
> > new file mode 100644
> > index 000000000000..8a70c0718df5
> > --- /dev/null
> > +++ b/tools/lib/api/io_dir.h
> > @@ -0,0 +1,72 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +/*
> > + * Lightweight directory reading library.
> > + */
> > +#ifndef __API_IO_DIR__
> > +#define __API_IO_DIR__
> > +
> > +#include <dirent.h>
> > +#include <fcntl.h>
> > +#include <stdlib.h>
> > +#include <unistd.h>
> > +#include <sys/stat.h>
> > +
> > +struct io_dirent64 {
> > + ino64_t d_ino; /* 64-bit inode number */
> > + off64_t d_off; /* 64-bit offset to next structure */
> > + unsigned short d_reclen; /* Size of this dirent */
> > + unsigned char d_type; /* File type */
> > + char d_name[NAME_MAX + 1]; /* Filename (null-terminated) */
> > +};
> > +
> > +struct io_dir {
> > + int dirfd;
> > + ssize_t available_bytes;
> > + struct io_dirent64 *next;
> > + struct io_dirent64 buff[4];
> > +};
> > +
> > +static inline void io_dir__init(struct io_dir *iod, int dirfd)
> > +{
> > + iod->dirfd = dirfd;
> > + iod->available_bytes = 0;
> > +}
> > +
> > +static inline void io_dir__rewinddir(struct io_dir *iod)
> > +{
> > + lseek(iod->dirfd, 0, SEEK_SET);
> > + iod->available_bytes = 0;
> > +}
> > +
> > +static inline struct io_dirent64 *io_dir__readdir(struct io_dir *iod)
> > +{
> > + struct io_dirent64 *entry;
> > +
> > + if (iod->available_bytes <= 0) {
> > + ssize_t rc = getdents64(iod->dirfd, iod->buff, sizeof(iod->buff));
> > +
> > + if (rc <= 0)
> > + return NULL;
> > + iod->available_bytes = rc;
> > + iod->next = iod->buff;
> > + }
> > + entry = iod->next;
> > + iod->next = (struct io_dirent64 *)((char *)entry + entry->d_reclen);
>
> Any chance of unaligned access?

It's a funny one. io_dirent64 has a fixed size but then the reclen can
be within that size as 256 bytes may not be used for the name. As the
name is chars then unalignment is possible, but libc doesn't worry
about unaligned access, so my take is the kernel is padding the reclen
to always be 8 byte aligned.

>
> > + iod->available_bytes -= entry->d_reclen;
> > + return entry;
> > +}
> > +
> > +static inline bool io_dir__is_dir(const struct io_dir *iod, const struct io_dirent64 *dent)
> > +{
> > + if (dent->d_type == DT_UNKNOWN) {
> > + struct stat st;
> > +
> > + if (fstatat(iod->dirfd, dent->d_name, &st, /*flags=*/0))
> > + return false;
> > +
> > + return S_ISDIR(st.st_mode);
>
> You may want to save the type if it's a DIR.

Sure. Fwiw, we shouldn't take this code path as the DT_UNKNOWN is only
used by some fairly obscure file systems.

Thanks,
Ian

> Thanks,
> Namhyung
>
>
> > + }
> > + return dent->d_type == DT_DIR;
> > +}
> > +
> > +#endif
> > --
> > 2.42.0.758.gaed0368e0e-goog
> >

2023-10-26 17:11:46

by Namhyung Kim

[permalink] [raw]
Subject: Re: (subset) [PATCH v3 00/50] Improvements to memory use

On Tue, 24 Oct 2023 15:23:03 -0700, Ian Rogers wrote:
> Fix memory leaks detected by address/leak sanitizer affecting LBR
> call-graphs, perf mem and BPF offcpu.
>
> Make branch_type_stat in callchain_list optional as it is large and
> not always necessary - in particular it isn't used by perf top.
>
> Make the allocations of zstd streams, kernel symbols and event copies
> lazier in order to save memory in cases like perf record.
>
> [...]

Applied the first 11 patches to perf-tool-next, thanks!