2023-03-13 20:48:35

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 0/4] perf lock contention: Improve lock symbol display (v1)

Hello,

This patchset improves the symbolization of locks for -l/--lock-addr mode.
As of now it only shows global lock symbols present in the kallsyms. But
we can add some more lock symbols by traversing pointers in the BPF program.

For example, mmap_lock can be reached from the mm_struct of the current task
(task_struct->mm->mmap_lock) and we can compare the address of the give lock
with it. Similarly I've added 'siglock' for current->sighand->siglock.

On the other hand, we can traverse some of semi-global locks like per-cpu,
per-device, per-filesystem and so on. I've added 'rqlock' for each cpu's
runqueue lock.

It cannot cover all types of locks in the system but it'd be fairly usefule
if we can add many of often contended locks. I tried to add futex locks
but it failed to find the __futex_data symbol from BTF. I'm not sure why but
I guess it's because the struct doesn't have a tag name.

Those locks are added just because they got caught during my test.
It'd be nice if you suggest which locks to add and how to do that. :)
I'm thinking if there's a way to track file-based locks (like epoll, etc).

Finally I also added a lock type name after the symbols (if any) so that we
can get some idea even though it has no symbol. The example output looks
like below:

$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol

44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock)
159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock)
10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem)
9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem)
78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock)
5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock)
10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock)
4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock)
1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock)
1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock)
7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex)
2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock)

The tasklist_lock is global so it's from the kallsyms. But others like
siglock, mmap_lock and rq_lock are from the BPF.

You get get the code at 'perf/lock-symbol-v1' branch in

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung

Namhyung Kim (4):
perf lock contention: Track and show mmap_lock with address
perf lock contention: Track and show siglock with address
perf lock contention: Show per-cpu rq_lock with address
perf lock contention: Show lock type with address

tools/perf/builtin-lock.c | 46 +++++++----
tools/perf/util/bpf_lock_contention.c | 35 ++++++++-
.../perf/util/bpf_skel/lock_contention.bpf.c | 77 +++++++++++++++++++
tools/perf/util/bpf_skel/lock_data.h | 14 ++++
4 files changed, 152 insertions(+), 20 deletions(-)


base-commit: b8fa3e3833c14151a47ebebbc5427dcfe94bb407
--
2.40.0.rc1.284.g88254d51c5-goog



2023-03-13 20:48:41

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 1/4] perf lock contention: Track and show mmap_lock with address

Sometimes there are severe contentions on the mmap_lock and we want
see it in the -l/--lock-addr output. However it cannot symbolize
the mmap_lock because it's allocated dynamically without symbols.

Stephane and Hao gave me an idea separately to display mmap_lock by
following the current->mm pointer. I added a flag to mark mmap_lock
after comparing the lock address so that it can show them differently.
With this change it can show mmap_lock like below:

$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80

Note that mmap_lock was renamed some time ago and it needs to support
old kernels with a different name 'mmap_sem'.

Suggested-by: Stephane Eranian <[email protected]>
Suggested-by: Hao Luo <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-lock.c | 2 +-
.../perf/util/bpf_skel/lock_contention.bpf.c | 41 +++++++++++++++++++
tools/perf/util/bpf_skel/lock_data.h | 6 +++
3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index 054997edd98b..c62f4d9363a6 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -1663,7 +1663,7 @@ static void print_contention_result(struct lock_contention *con)
break;
case LOCK_AGGR_ADDR:
pr_info(" %016llx %s\n", (unsigned long long)st->addr,
- st->name ? : "");
+ (st->flags & LCD_F_MMAP_LOCK) ? "mmap_lock" : st->name);
break;
default:
break;
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index e6007eaeda1a..f092a78ae2b5 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -92,6 +92,14 @@ struct rw_semaphore___new {
atomic_long_t owner;
} __attribute__((preserve_access_index));

+struct mm_struct___old {
+ struct rw_semaphore mmap_sem;
+} __attribute__((preserve_access_index));
+
+struct mm_struct___new {
+ struct rw_semaphore mmap_lock;
+} __attribute__((preserve_access_index));
+
/* control flags */
int enabled;
int has_cpu;
@@ -204,6 +212,36 @@ static inline struct task_struct *get_lock_owner(__u64 lock, __u32 flags)
return task;
}

+static inline __u32 check_lock_type(__u64 lock, __u32 flags)
+{
+ struct task_struct *curr;
+ struct mm_struct___old *mm_old;
+ struct mm_struct___new *mm_new;
+
+ switch (flags) {
+ case LCB_F_READ: /* rwsem */
+ case LCB_F_WRITE:
+ curr = bpf_get_current_task_btf();
+ if (curr->mm == NULL)
+ break;
+ mm_new = (void *)curr->mm;
+ if (bpf_core_field_exists(mm_new->mmap_lock)) {
+ if (&mm_new->mmap_lock == (void *)lock)
+ return LCD_F_MMAP_LOCK;
+ break;
+ }
+ mm_old = (void *)curr->mm;
+ if (bpf_core_field_exists(mm_old->mmap_sem)) {
+ if (&mm_old->mmap_sem == (void *)lock)
+ return LCD_F_MMAP_LOCK;
+ }
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
SEC("tp_btf/contention_begin")
int contention_begin(u64 *ctx)
{
@@ -314,6 +352,9 @@ int contention_end(u64 *ctx)
.flags = pelem->flags,
};

+ if (aggr_mode == LOCK_AGGR_ADDR)
+ first.flags |= check_lock_type(pelem->lock, pelem->flags);
+
bpf_map_update_elem(&lock_stat, &key, &first, BPF_NOEXIST);
bpf_map_delete_elem(&tstamp, &pid);
return 0;
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index 3d35fd4407ac..789f20833798 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -15,6 +15,12 @@ struct contention_task_data {
char comm[TASK_COMM_LEN];
};

+/*
+ * Upper bits of the flags in the contention_data are used to identify
+ * some well-known locks which do not have symbols (non-global locks).
+ */
+#define LCD_F_MMAP_LOCK (1U << 31)
+
struct contention_data {
u64 total_time;
u64 min_time;
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-13 20:49:06

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 4/4] perf lock contention: Show lock type with address

Show lock type names after the symbol of locks if any. This can be
useful especially when it doesn't show the lock symbols.

The indentation before the lock type parenthesis is to recognize lock
symbols more easily.

$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol

44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock)
159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock)
10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem)
9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem)
78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock)
5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock)
10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock)
4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock)
1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock)
1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock)
7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex)
2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock)

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-lock.c | 45 ++++++++++++++++++----------
tools/perf/util/bpf_skel/lock_data.h | 2 ++
2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index c710a5d46638..a845c0ce5dc8 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -1548,27 +1548,41 @@ static void sort_result(void)

static const struct {
unsigned int flags;
+ const char *str;
const char *name;
} lock_type_table[] = {
- { 0, "semaphore" },
- { LCB_F_SPIN, "spinlock" },
- { LCB_F_SPIN | LCB_F_READ, "rwlock:R" },
- { LCB_F_SPIN | LCB_F_WRITE, "rwlock:W"},
- { LCB_F_READ, "rwsem:R" },
- { LCB_F_WRITE, "rwsem:W" },
- { LCB_F_RT, "rtmutex" },
- { LCB_F_RT | LCB_F_READ, "rwlock-rt:R" },
- { LCB_F_RT | LCB_F_WRITE, "rwlock-rt:W"},
- { LCB_F_PERCPU | LCB_F_READ, "pcpu-sem:R" },
- { LCB_F_PERCPU | LCB_F_WRITE, "pcpu-sem:W" },
- { LCB_F_MUTEX, "mutex" },
- { LCB_F_MUTEX | LCB_F_SPIN, "mutex" },
+ { 0, "semaphore", "semaphore" },
+ { LCB_F_SPIN, "spinlock", "spinlock" },
+ { LCB_F_SPIN | LCB_F_READ, "rwlock:R", "rwlock" },
+ { LCB_F_SPIN | LCB_F_WRITE, "rwlock:W", "rwlock" },
+ { LCB_F_READ, "rwsem:R", "rwsem" },
+ { LCB_F_WRITE, "rwsem:W", "rwsem" },
+ { LCB_F_RT, "rt=mutex", "rt-mutex" },
+ { LCB_F_RT | LCB_F_READ, "rwlock-rt:R", "rwlock-rt" },
+ { LCB_F_RT | LCB_F_WRITE, "rwlock-rt:W", "rwlock-rt" },
+ { LCB_F_PERCPU | LCB_F_READ, "pcpu-sem:R", "percpu-rwsem" },
+ { LCB_F_PERCPU | LCB_F_WRITE, "pcpu-sem:W", "percpu-rwsem" },
+ { LCB_F_MUTEX, "mutex", "mutex" },
+ { LCB_F_MUTEX | LCB_F_SPIN, "mutex", "mutex" },
/* alias for get_type_flag() */
- { LCB_F_MUTEX | LCB_F_SPIN, "mutex-spin" },
+ { LCB_F_MUTEX | LCB_F_SPIN, "mutex-spin", "mutex" },
};

static const char *get_type_str(unsigned int flags)
{
+ flags &= LCB_F_MAX_FLAGS - 1;
+
+ for (unsigned int i = 0; i < ARRAY_SIZE(lock_type_table); i++) {
+ if (lock_type_table[i].flags == flags)
+ return lock_type_table[i].str;
+ }
+ return "unknown";
+}
+
+static const char *get_type_name(unsigned int flags)
+{
+ flags &= LCB_F_MAX_FLAGS - 1;
+
for (unsigned int i = 0; i < ARRAY_SIZE(lock_type_table); i++) {
if (lock_type_table[i].flags == flags)
return lock_type_table[i].name;
@@ -1662,7 +1676,8 @@ static void print_contention_result(struct lock_contention *con)
pid, pid == -1 ? "Unknown" : thread__comm_str(t));
break;
case LOCK_AGGR_ADDR:
- pr_info(" %016llx %s\n", (unsigned long long)st->addr, st->name);
+ pr_info(" %016llx %s (%s)\n", (unsigned long long)st->addr,
+ st->name, get_type_name(st->flags));
break;
default:
break;
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index e59366f2dba3..1ba61cb4d480 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -22,6 +22,8 @@ struct contention_task_data {
#define LCD_F_MMAP_LOCK (1U << 31)
#define LCD_F_SIGHAND_LOCK (1U << 30)

+#define LCB_F_MAX_FLAGS (1U << 7)
+
struct contention_data {
u64 total_time;
u64 min_time;
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-13 20:49:06

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 3/4] perf lock contention: Show per-cpu rq_lock with address

Using the BPF_PROG_RUN mechanism, we can run a raw_tp BPF program to
collect some semi-global locks like per-cpu locks. Let's add runqueue
locks using bpf_per_cpu_ptr() helper.

$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol

248 3.25 ms 32.23 us 13.10 us ffff8cc75cfd2940 siglock
60 217.91 us 9.69 us 3.63 us ffff8cc700061c00
8 70.23 us 13.86 us 8.78 us ffff8cc703629484
4 56.32 us 35.81 us 14.08 us ffff8cc78b66f778 mmap_lock
4 16.70 us 5.18 us 4.18 us ffff8cc7036a0684
3 4.99 us 2.65 us 1.66 us ffff8d053da30c80 rq_lock
2 3.44 us 2.28 us 1.72 us ffff8d053dcf0c80 rq_lock
9 2.51 us 371 ns 278 ns ffff8ccb92479440
2 2.11 us 1.24 us 1.06 us ffff8d053db30c80 rq_lock
2 2.06 us 1.69 us 1.03 us ffff8d053d970c80 rq_lock

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/bpf_lock_contention.c | 27 ++++++++++++++--
.../perf/util/bpf_skel/lock_contention.bpf.c | 31 +++++++++++++++++++
tools/perf/util/bpf_skel/lock_data.h | 5 +++
3 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index 51631af3b4d6..235fc7150545 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -151,6 +151,8 @@ int lock_contention_prepare(struct lock_contention *con)
skel->bss->needs_callstack = con->save_callstack;
skel->bss->lock_owner = con->owner;

+ bpf_program__set_autoload(skel->progs.collect_lock_syms, false);
+
lock_contention_bpf__attach(skel);
return 0;
}
@@ -198,14 +200,26 @@ static const char *lock_contention_get_name(struct lock_contention *con,
}

if (con->aggr_mode == LOCK_AGGR_ADDR) {
+ int lock_fd = bpf_map__fd(skel->maps.lock_syms);
+
+ /* per-process locks set upper bits of the flags */
if (flags & LCD_F_MMAP_LOCK)
return "mmap_lock";
if (flags & LCD_F_SIGHAND_LOCK)
return "siglock";
+
+ /* global locks with symbols */
sym = machine__find_kernel_symbol(machine, key->lock_addr, &kmap);
if (sym)
- name = sym->name;
- return name;
+ return sym->name;
+
+ /* try semi-global locks collected separately */
+ if (!bpf_map_lookup_elem(lock_fd, &key->lock_addr, &flags)) {
+ if (flags == LOCK_CLASS_RQLOCK)
+ return "rq_lock";
+ }
+
+ return "";
}

/* LOCK_AGGR_CALLER: skip lock internal functions */
@@ -258,6 +272,15 @@ int lock_contention_read(struct lock_contention *con)
thread__set_comm(idle, "swapper", /*timestamp=*/0);
}

+ if (con->aggr_mode == LOCK_AGGR_ADDR) {
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .flags = BPF_F_TEST_RUN_ON_CPU,
+ );
+ int prog_fd = bpf_program__fd(skel->progs.collect_lock_syms);
+
+ bpf_prog_test_run_opts(prog_fd, &opts);
+ }
+
/* make sure it loads the kernel map */
map__load(maps__first(machine->kmaps));

diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index 4ba34caf84eb..2d50c4395733 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -10,6 +10,9 @@
/* default buffer size */
#define MAX_ENTRIES 10240

+/* for collect_lock_syms(). 4096 was rejected by the verifier */
+#define MAX_CPUS 1024
+
/* lock contention flags from include/trace/events/lock.h */
#define LCB_F_SPIN (1U << 0)
#define LCB_F_READ (1U << 1)
@@ -56,6 +59,13 @@ struct {
__uint(max_entries, MAX_ENTRIES);
} task_data SEC(".maps");

+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(__u64));
+ __uint(value_size, sizeof(__u32));
+ __uint(max_entries, 16384);
+} lock_syms SEC(".maps");
+
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(key_size, sizeof(__u32));
@@ -378,4 +388,25 @@ int contention_end(u64 *ctx)
return 0;
}

+extern struct rq runqueues __ksym;
+
+SEC("raw_tp/bpf_test_finish")
+int BPF_PROG(collect_lock_syms)
+{
+ __u64 lock_addr;
+ __u32 lock_flag;
+
+ for (int i = 0; i < MAX_CPUS; i++) {
+ struct rq *rq = bpf_per_cpu_ptr(&runqueues, i);
+
+ if (rq == NULL)
+ break;
+
+ lock_addr = (__u64)&rq->__lock;
+ lock_flag = LOCK_CLASS_RQLOCK;
+ bpf_map_update_elem(&lock_syms, &lock_addr, &lock_flag, BPF_ANY);
+ }
+ return 0;
+}
+
char LICENSE[] SEC("license") = "Dual BSD/GPL";
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index 5ed1a0955015..e59366f2dba3 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -36,4 +36,9 @@ enum lock_aggr_mode {
LOCK_AGGR_CALLER,
};

+enum lock_class_sym {
+ LOCK_CLASS_NONE,
+ LOCK_CLASS_RQLOCK,
+};
+
#endif /* UTIL_BPF_SKEL_LOCK_DATA_H */
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-13 20:49:06

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 2/4] perf lock contention: Track and show siglock with address

Likewise, we can display siglock by following the pointer like
current->sighand->siglock.

$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol

16 2.18 ms 305.35 us 136.34 us ffffffff92e06080 tasklist_lock
28 521.78 us 31.16 us 18.63 us ffff8cc703783ec4
7 119.03 us 23.55 us 17.00 us ffff8ccb92479440
15 88.29 us 10.06 us 5.89 us ffff8cd560b5f380 siglock
7 37.67 us 9.16 us 5.38 us ffff8d053daf0c80
5 8.81 us 4.92 us 1.76 us ffff8d053d6b0c80

Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-lock.c | 3 +--
tools/perf/util/bpf_lock_contention.c | 8 ++++++--
tools/perf/util/bpf_skel/lock_contention.bpf.c | 5 +++++
tools/perf/util/bpf_skel/lock_data.h | 3 ++-
4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index c62f4d9363a6..c710a5d46638 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -1662,8 +1662,7 @@ static void print_contention_result(struct lock_contention *con)
pid, pid == -1 ? "Unknown" : thread__comm_str(t));
break;
case LOCK_AGGR_ADDR:
- pr_info(" %016llx %s\n", (unsigned long long)st->addr,
- (st->flags & LCD_F_MMAP_LOCK) ? "mmap_lock" : st->name);
+ pr_info(" %016llx %s\n", (unsigned long long)st->addr, st->name);
break;
default:
break;
diff --git a/tools/perf/util/bpf_lock_contention.c b/tools/perf/util/bpf_lock_contention.c
index fadcacb9d501..51631af3b4d6 100644
--- a/tools/perf/util/bpf_lock_contention.c
+++ b/tools/perf/util/bpf_lock_contention.c
@@ -169,7 +169,7 @@ int lock_contention_stop(void)

static const char *lock_contention_get_name(struct lock_contention *con,
struct contention_key *key,
- u64 *stack_trace)
+ u64 *stack_trace, u32 flags)
{
int idx = 0;
u64 addr;
@@ -198,6 +198,10 @@ static const char *lock_contention_get_name(struct lock_contention *con,
}

if (con->aggr_mode == LOCK_AGGR_ADDR) {
+ if (flags & LCD_F_MMAP_LOCK)
+ return "mmap_lock";
+ if (flags & LCD_F_SIGHAND_LOCK)
+ return "siglock";
sym = machine__find_kernel_symbol(machine, key->lock_addr, &kmap);
if (sym)
name = sym->name;
@@ -301,7 +305,7 @@ int lock_contention_read(struct lock_contention *con)
goto next;
}

- name = lock_contention_get_name(con, &key, stack_trace);
+ name = lock_contention_get_name(con, &key, stack_trace, data.flags);
st = lock_stat_findnew(ls_key, name, data.flags);
if (st == NULL)
break;
diff --git a/tools/perf/util/bpf_skel/lock_contention.bpf.c b/tools/perf/util/bpf_skel/lock_contention.bpf.c
index f092a78ae2b5..4ba34caf84eb 100644
--- a/tools/perf/util/bpf_skel/lock_contention.bpf.c
+++ b/tools/perf/util/bpf_skel/lock_contention.bpf.c
@@ -236,6 +236,11 @@ static inline __u32 check_lock_type(__u64 lock, __u32 flags)
return LCD_F_MMAP_LOCK;
}
break;
+ case LCB_F_SPIN: /* spinlock */
+ curr = bpf_get_current_task_btf();
+ if (&curr->sighand->siglock == (void *)lock)
+ return LCD_F_SIGHAND_LOCK;
+ break;
default:
break;
}
diff --git a/tools/perf/util/bpf_skel/lock_data.h b/tools/perf/util/bpf_skel/lock_data.h
index 789f20833798..5ed1a0955015 100644
--- a/tools/perf/util/bpf_skel/lock_data.h
+++ b/tools/perf/util/bpf_skel/lock_data.h
@@ -19,7 +19,8 @@ struct contention_task_data {
* Upper bits of the flags in the contention_data are used to identify
* some well-known locks which do not have symbols (non-global locks).
*/
-#define LCD_F_MMAP_LOCK (1U << 31)
+#define LCD_F_MMAP_LOCK (1U << 31)
+#define LCD_F_SIGHAND_LOCK (1U << 30)

struct contention_data {
u64 total_time;
--
2.40.0.rc1.284.g88254d51c5-goog


2023-03-13 21:46:03

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf lock contention: Improve lock symbol display (v1)

Em Mon, Mar 13, 2023 at 01:48:21PM -0700, Namhyung Kim escreveu:
> Hello,
>
> This patchset improves the symbolization of locks for -l/--lock-addr mode.
> As of now it only shows global lock symbols present in the kallsyms. But
> we can add some more lock symbols by traversing pointers in the BPF program.
>
> For example, mmap_lock can be reached from the mm_struct of the current task
> (task_struct->mm->mmap_lock) and we can compare the address of the give lock
> with it. Similarly I've added 'siglock' for current->sighand->siglock.
>
> On the other hand, we can traverse some of semi-global locks like per-cpu,
> per-device, per-filesystem and so on. I've added 'rqlock' for each cpu's
> runqueue lock.
>
> It cannot cover all types of locks in the system but it'd be fairly usefule
> if we can add many of often contended locks. I tried to add futex locks
> but it failed to find the __futex_data symbol from BTF. I'm not sure why but
> I guess it's because the struct doesn't have a tag name.
>
> Those locks are added just because they got caught during my test.
> It'd be nice if you suggest which locks to add and how to do that. :)
> I'm thinking if there's a way to track file-based locks (like epoll, etc).
>
> Finally I also added a lock type name after the symbols (if any) so that we
> can get some idea even though it has no symbol. The example output looks
> like below:
>
> $ sudo ./perf lock con -abl -- sleep 1
> contended total wait max wait avg wait address symbol
>
> 44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock)
> 159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock)
> 10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem)
> 9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem)
> 78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock)
> 5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock)
> 10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock)
> 4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock)
> 1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock)
> 1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock)
> 7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex)
> 2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock)
>
> The tasklist_lock is global so it's from the kallsyms. But others like
> siglock, mmap_lock and rq_lock are from the BPF.

Beautiful :-)

And the csets are _so_ small and demonstrate techniques that should be
used in more and more tools.

Applied, testing.

- Arnaldo

> You get get the code at 'perf/lock-symbol-v1' branch in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
> Namhyung Kim (4):
> perf lock contention: Track and show mmap_lock with address
> perf lock contention: Track and show siglock with address
> perf lock contention: Show per-cpu rq_lock with address
> perf lock contention: Show lock type with address
>
> tools/perf/builtin-lock.c | 46 +++++++----
> tools/perf/util/bpf_lock_contention.c | 35 ++++++++-
> .../perf/util/bpf_skel/lock_contention.bpf.c | 77 +++++++++++++++++++
> tools/perf/util/bpf_skel/lock_data.h | 14 ++++
> 4 files changed, 152 insertions(+), 20 deletions(-)
>
>
> base-commit: b8fa3e3833c14151a47ebebbc5427dcfe94bb407
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>

--

- Arnaldo

2023-03-14 12:31:25

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf lock contention: Improve lock symbol display (v1)

Em Mon, Mar 13, 2023 at 06:45:53PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Mar 13, 2023 at 01:48:21PM -0700, Namhyung Kim escreveu:
> > Hello,
> >
> > This patchset improves the symbolization of locks for -l/--lock-addr mode.
> > As of now it only shows global lock symbols present in the kallsyms. But
> > we can add some more lock symbols by traversing pointers in the BPF program.
> >
> > For example, mmap_lock can be reached from the mm_struct of the current task
> > (task_struct->mm->mmap_lock) and we can compare the address of the give lock
> > with it. Similarly I've added 'siglock' for current->sighand->siglock.

Hey, we can go a bit further by using something like pahole's
--expand_types and --expand_pointers and play iterating a type members
and looking for locks, like:

⬢[acme@toolbox pahole]$ pahole task_struct | grep spinlock_t
spinlock_t alloc_lock; /* 3280 4 */
raw_spinlock_t pi_lock; /* 3284 4 */
seqcount_spinlock_t mems_allowed_seq; /* 3616 4 */
⬢[acme@toolbox pahole]$

Expand points will find mmap_lock:

⬢[acme@toolbox pahole]$ pahole --expand_pointers -C task_struct | grep -B10 mmap_lock
} *pgd;
atomic_t membarrier_state;
atomic_t mm_users;
atomic_t mm_count;

/* XXX 4 bytes hole, try to pack */

atomic_long_t pgtables_bytes;
int map_count;
spinlock_t page_table_lock;
struct rw_semaphore mmap_lock;
^C
⬢[acme@toolbox pahole]$


ITs just too much expansion to see task_struct->mm, but it is there, of
course:

⬢[acme@toolbox pahole]$ pahole mm_struct | grep mmap_lock
struct rw_semaphore mmap_lock; /* 120 40 */
⬢[acme@toolbox pahole]$

Also:

⬢[acme@toolbox pahole]$ pahole --contains rw_semaphore
address_space
signal_struct
key
inode
super_block
quota_info
user_namespace
blocking_notifier_head
backing_dev_info
anon_vma
tty_struct
cpufreq_policy
tcf_block
ipc_ids
autogroup
kvm_arch
posix_clock
listener_list
uprobe
kernfs_root
configfs_fragment
ext4_inode_info
ext4_group_info
btrfs_fs_info
extent_buffer
btrfs_dev_replace
btrfs_space_info
btrfs_inode
btrfs_block_group
tpm_chip
ib_device
ib_xrcd
blk_crypto_profile
controller
led_classdev
cppc_pcc_data
dm_snapshot
⬢[acme@toolbox pahole]$

And:

⬢[acme@toolbox pahole]$ pahole --find_pointers_to mm_struct
task_struct: mm
task_struct: active_mm
vm_area_struct: vm_mm
flush_tlb_info: mm
signal_struct: oom_mm
tlb_state: loaded_mm
linux_binprm: mm
mmu_gather: mm
trace_event_raw_xen_mmu_ptep_modify_prot: mm
trace_event_raw_xen_mmu_alloc_ptpage: mm
trace_event_raw_xen_mmu_pgd: mm
trace_event_raw_xen_mmu_flush_tlb_multi: mm
trace_event_raw_hyperv_mmu_flush_tlb_multi: mm
mmu_notifier: mm
mmu_notifier_range: mm
sgx_encl_mm: mm
rq: prev_mm
kvm: mm
cpuset_migrate_mm_work: mm
mmap_unlock_irq_work: mm
delayed_uprobe: mm
map_info: mm
trace_event_raw_mmap_lock: mm
trace_event_raw_mmap_lock_acquire_returned: mm
mm_walk: mm
make_exclusive_args: mm
mmu_interval_notifier: mm
mm_slot: mm
rmap_item: mm
trace_event_raw_mm_khugepaged_scan_pmd: mm
trace_event_raw_mm_collapse_huge_page: mm
trace_event_raw_mm_collapse_huge_page_swapin: mm
mm_slot: mm
move_charge_struct: mm
userfaultfd_ctx: mm
proc_maps_private: mm
remap_pfn: mm
intel_svm: mm
binder_alloc: vma_vm_mm
⬢[acme@toolbox pahole]$

- Arnaldo


> > On the other hand, we can traverse some of semi-global locks like per-cpu,
> > per-device, per-filesystem and so on. I've added 'rqlock' for each cpu's
> > runqueue lock.
> >
> > It cannot cover all types of locks in the system but it'd be fairly usefule
> > if we can add many of often contended locks. I tried to add futex locks
> > but it failed to find the __futex_data symbol from BTF. I'm not sure why but
> > I guess it's because the struct doesn't have a tag name.
> >
> > Those locks are added just because they got caught during my test.
> > It'd be nice if you suggest which locks to add and how to do that. :)
> > I'm thinking if there's a way to track file-based locks (like epoll, etc).
> >
> > Finally I also added a lock type name after the symbols (if any) so that we
> > can get some idea even though it has no symbol. The example output looks
> > like below:
> >
> > $ sudo ./perf lock con -abl -- sleep 1
> > contended total wait max wait avg wait address symbol
> >
> > 44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock)
> > 159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock)
> > 10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem)
> > 9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem)
> > 78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock)
> > 5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock)
> > 10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock)
> > 4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock)
> > 1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock)
> > 1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock)
> > 7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex)
> > 2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock)
> >
> > The tasklist_lock is global so it's from the kallsyms. But others like
> > siglock, mmap_lock and rq_lock are from the BPF.
>
> Beautiful :-)
>
> And the csets are _so_ small and demonstrate techniques that should be
> used in more and more tools.
>
> Applied, testing.
>
> - Arnaldo
>
> > You get get the code at 'perf/lock-symbol-v1' branch in
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> >
> > Thanks,
> > Namhyung
> >
> > Namhyung Kim (4):
> > perf lock contention: Track and show mmap_lock with address
> > perf lock contention: Track and show siglock with address
> > perf lock contention: Show per-cpu rq_lock with address
> > perf lock contention: Show lock type with address
> >
> > tools/perf/builtin-lock.c | 46 +++++++----
> > tools/perf/util/bpf_lock_contention.c | 35 ++++++++-
> > .../perf/util/bpf_skel/lock_contention.bpf.c | 77 +++++++++++++++++++
> > tools/perf/util/bpf_skel/lock_data.h | 14 ++++
> > 4 files changed, 152 insertions(+), 20 deletions(-)
> >
> >
> > base-commit: b8fa3e3833c14151a47ebebbc5427dcfe94bb407
> > --
> > 2.40.0.rc1.284.g88254d51c5-goog
> >
>
> --
>
> - Arnaldo

--

- Arnaldo

2023-03-14 17:55:31

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 0/4] perf lock contention: Improve lock symbol display (v1)

Hi Arnaldo,

On Tue, Mar 14, 2023 at 5:23 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Mon, Mar 13, 2023 at 06:45:53PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Mon, Mar 13, 2023 at 01:48:21PM -0700, Namhyung Kim escreveu:
> > > Hello,
> > >
> > > This patchset improves the symbolization of locks for -l/--lock-addr mode.
> > > As of now it only shows global lock symbols present in the kallsyms. But
> > > we can add some more lock symbols by traversing pointers in the BPF program.
> > >
> > > For example, mmap_lock can be reached from the mm_struct of the current task
> > > (task_struct->mm->mmap_lock) and we can compare the address of the give lock
> > > with it. Similarly I've added 'siglock' for current->sighand->siglock.
>
> Hey, we can go a bit further by using something like pahole's
> --expand_types and --expand_pointers and play iterating a type members
> and looking for locks, like:
>
> ⬢[acme@toolbox pahole]$ pahole task_struct | grep spinlock_t
> spinlock_t alloc_lock; /* 3280 4 */
> raw_spinlock_t pi_lock; /* 3284 4 */
> seqcount_spinlock_t mems_allowed_seq; /* 3616 4 */
> ⬢[acme@toolbox pahole]$
>
> Expand points will find mmap_lock:
>
> ⬢[acme@toolbox pahole]$ pahole --expand_pointers -C task_struct | grep -B10 mmap_lock
> } *pgd;
> atomic_t membarrier_state;
> atomic_t mm_users;
> atomic_t mm_count;
>
> /* XXX 4 bytes hole, try to pack */
>
> atomic_long_t pgtables_bytes;
> int map_count;
> spinlock_t page_table_lock;
> struct rw_semaphore mmap_lock;
> ^C
> ⬢[acme@toolbox pahole]$
>
>
> ITs just too much expansion to see task_struct->mm, but it is there, of
> course:
>
> ⬢[acme@toolbox pahole]$ pahole mm_struct | grep mmap_lock
> struct rw_semaphore mmap_lock; /* 120 40 */
> ⬢[acme@toolbox pahole]$
>
> Also:
>
> ⬢[acme@toolbox pahole]$ pahole --contains rw_semaphore
> address_space
> signal_struct
> key
> inode
> super_block
> quota_info
> user_namespace
> blocking_notifier_head
> backing_dev_info
> anon_vma
> tty_struct
> cpufreq_policy
> tcf_block
> ipc_ids
> autogroup
> kvm_arch
> posix_clock
> listener_list
> uprobe
> kernfs_root
> configfs_fragment
> ext4_inode_info
> ext4_group_info
> btrfs_fs_info
> extent_buffer
> btrfs_dev_replace
> btrfs_space_info
> btrfs_inode
> btrfs_block_group
> tpm_chip
> ib_device
> ib_xrcd
> blk_crypto_profile
> controller
> led_classdev
> cppc_pcc_data
> dm_snapshot
> ⬢[acme@toolbox pahole]$
>
> And:
>
> ⬢[acme@toolbox pahole]$ pahole --find_pointers_to mm_struct
> task_struct: mm
> task_struct: active_mm
> vm_area_struct: vm_mm
> flush_tlb_info: mm
> signal_struct: oom_mm
> tlb_state: loaded_mm
> linux_binprm: mm
> mmu_gather: mm
> trace_event_raw_xen_mmu_ptep_modify_prot: mm
> trace_event_raw_xen_mmu_alloc_ptpage: mm
> trace_event_raw_xen_mmu_pgd: mm
> trace_event_raw_xen_mmu_flush_tlb_multi: mm
> trace_event_raw_hyperv_mmu_flush_tlb_multi: mm
> mmu_notifier: mm
> mmu_notifier_range: mm
> sgx_encl_mm: mm
> rq: prev_mm
> kvm: mm
> cpuset_migrate_mm_work: mm
> mmap_unlock_irq_work: mm
> delayed_uprobe: mm
> map_info: mm
> trace_event_raw_mmap_lock: mm
> trace_event_raw_mmap_lock_acquire_returned: mm
> mm_walk: mm
> make_exclusive_args: mm
> mmu_interval_notifier: mm
> mm_slot: mm
> rmap_item: mm
> trace_event_raw_mm_khugepaged_scan_pmd: mm
> trace_event_raw_mm_collapse_huge_page: mm
> trace_event_raw_mm_collapse_huge_page_swapin: mm
> mm_slot: mm
> move_charge_struct: mm
> userfaultfd_ctx: mm
> proc_maps_private: mm
> remap_pfn: mm
> intel_svm: mm
> binder_alloc: vma_vm_mm
> ⬢[acme@toolbox pahole]$

This looks really cool! especially.

I'm especially interested in adding super_block and kernfs_root.
Let me see how I can add them.

Thanks,
Namhyung