2020-06-29 18:54:40

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 bpf-next 0/4] bpf: introduce bpf_get_task_stack()

This set introduces a new helper bpf_get_task_stack(). The primary use case
is to dump all /proc/*/stack to seq_file via bpf_iter__task.

A few different approaches have been explored and compared:

1. A simple wrapper around stack_trace_save_tsk(), as v1 [1].

This approach introduces new syntax, which is different to existing
helper bpf_get_stack(). Therefore, this is not ideal.

2. Extend get_perf_callchain() to support "task" as argument.

This approach reuses most of bpf_get_stack(). However, extending
get_perf_callchain() requires non-trivial changes to architecture
specific code. Which is error prone.

3. Current (v2) approach, leverages most of existing bpf_get_stack(), and
uses stack_trace_save_tsk() to handle architecture specific logic.

[1] https://lore.kernel.org/netdev/[email protected]/

Changes v3 => v4:
1. Simplify the selftests with bpf_iter.h. (Yonghong)
2. Add example output to commit log of 4/4. (Yonghong)

Changes v2 => v3:
1. Rebase on top of bpf-next. (Yonghong)
2. Sanitize get_callchain_entry(). (Peter)
3. Use has_callchain_buf for bpf_get_task_stack. (Andrii)
4. Other small clean up. (Yonghong, Andrii).

Changes v1 => v2:
1. Reuse most of bpf_get_stack() logic. (Andrii)
2. Fix unsigned long vs. u64 mismatch for 32-bit systems. (Yonghong)
3. Add %pB support in bpf_trace_printk(). (Daniel)
4. Fix buffer size to bytes.

Song Liu (4):
perf: expose get/put_callchain_entry()
bpf: introduce helper bpf_get_task_stack()
bpf: allow %pB in bpf_seq_printf() and bpf_trace_printk()
selftests/bpf: add bpf_iter test with bpf_get_task_stack()

include/linux/bpf.h | 1 +
include/linux/perf_event.h | 2 +
include/uapi/linux/bpf.h | 36 ++++++++-
kernel/bpf/stackmap.c | 75 ++++++++++++++++++-
kernel/bpf/verifier.c | 4 +-
kernel/events/callchain.c | 13 ++--
kernel/trace/bpf_trace.c | 12 ++-
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 36 ++++++++-
.../selftests/bpf/prog_tests/bpf_iter.c | 17 +++++
.../selftests/bpf/progs/bpf_iter_task_stack.c | 37 +++++++++
11 files changed, 220 insertions(+), 15 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c

--
2.24.1


2020-06-29 18:57:06

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 bpf-next 1/4] perf: expose get/put_callchain_entry()

Sanitize and expose get/put_callchain_entry(). This would be used by bpf
stack map.

Suggested-by: Peter Zijlstra <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
include/linux/perf_event.h | 2 ++
kernel/events/callchain.c | 13 ++++++-------
2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b4bb32082342c..00ab5efa38334 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1244,6 +1244,8 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
extern struct perf_callchain_entry *perf_callchain(struct perf_event *event, struct pt_regs *regs);
extern int get_callchain_buffers(int max_stack);
extern void put_callchain_buffers(void);
+extern struct perf_callchain_entry *get_callchain_entry(int *rctx);
+extern void put_callchain_entry(int rctx);

extern int sysctl_perf_event_max_stack;
extern int sysctl_perf_event_max_contexts_per_stack;
diff --git a/kernel/events/callchain.c b/kernel/events/callchain.c
index 334d48b16c36d..c6ce894e4ce94 100644
--- a/kernel/events/callchain.c
+++ b/kernel/events/callchain.c
@@ -149,7 +149,7 @@ void put_callchain_buffers(void)
}
}

-static struct perf_callchain_entry *get_callchain_entry(int *rctx)
+struct perf_callchain_entry *get_callchain_entry(int *rctx)
{
int cpu;
struct callchain_cpus_entries *entries;
@@ -159,8 +159,10 @@ static struct perf_callchain_entry *get_callchain_entry(int *rctx)
return NULL;

entries = rcu_dereference(callchain_cpus_entries);
- if (!entries)
+ if (!entries) {
+ put_recursion_context(this_cpu_ptr(callchain_recursion), *rctx);
return NULL;
+ }

cpu = smp_processor_id();

@@ -168,7 +170,7 @@ static struct perf_callchain_entry *get_callchain_entry(int *rctx)
(*rctx * perf_callchain_entry__sizeof()));
}

-static void
+void
put_callchain_entry(int rctx)
{
put_recursion_context(this_cpu_ptr(callchain_recursion), rctx);
@@ -183,11 +185,8 @@ get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
int rctx;

entry = get_callchain_entry(&rctx);
- if (rctx == -1)
- return NULL;
-
if (!entry)
- goto exit_put;
+ return NULL;

ctx.entry = entry;
ctx.max_stack = max_stack;
--
2.24.1

2020-06-29 19:02:28

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 bpf-next 4/4] selftests/bpf: add bpf_iter test with bpf_get_task_stack()

The new test is similar to other bpf_iter tests. It dumps all
/proc/<pid>/stack to a seq_file. Here is some example output:

pid: 2873 num_entries: 3
[<0>] worker_thread+0xc6/0x380
[<0>] kthread+0x135/0x150
[<0>] ret_from_fork+0x22/0x30

pid: 2874 num_entries: 9
[<0>] __bpf_get_stack+0x15e/0x250
[<0>] bpf_prog_22a400774977bb30_dump_task_stack+0x4a/0xb3c
[<0>] bpf_iter_run_prog+0x81/0x170
[<0>] __task_seq_show+0x58/0x80
[<0>] bpf_seq_read+0x1c3/0x3b0
[<0>] vfs_read+0x9e/0x170
[<0>] ksys_read+0xa7/0xe0
[<0>] do_syscall_64+0x4c/0xa0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Note: To print the output, it is necessary to modify the selftest.

Signed-off-by: Song Liu <[email protected]>
---
.../selftests/bpf/prog_tests/bpf_iter.c | 17 +++++++++
.../selftests/bpf/progs/bpf_iter_task_stack.c | 37 +++++++++++++++++++
2 files changed, 54 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 1e2e0fced6e81..fed42755416db 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -5,6 +5,7 @@
#include "bpf_iter_netlink.skel.h"
#include "bpf_iter_bpf_map.skel.h"
#include "bpf_iter_task.skel.h"
+#include "bpf_iter_task_stack.skel.h"
#include "bpf_iter_task_file.skel.h"
#include "bpf_iter_tcp4.skel.h"
#include "bpf_iter_tcp6.skel.h"
@@ -110,6 +111,20 @@ static void test_task(void)
bpf_iter_task__destroy(skel);
}

+static void test_task_stack(void)
+{
+ struct bpf_iter_task_stack *skel;
+
+ skel = bpf_iter_task_stack__open_and_load();
+ if (CHECK(!skel, "bpf_iter_task_stack__open_and_load",
+ "skeleton open_and_load failed\n"))
+ return;
+
+ do_dummy_read(skel->progs.dump_task_stack);
+
+ bpf_iter_task_stack__destroy(skel);
+}
+
static void test_task_file(void)
{
struct bpf_iter_task_file *skel;
@@ -452,6 +467,8 @@ void test_bpf_iter(void)
test_bpf_map();
if (test__start_subtest("task"))
test_task();
+ if (test__start_subtest("task_stack"))
+ test_task_stack();
if (test__start_subtest("task_file"))
test_task_file();
if (test__start_subtest("tcp4"))
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
new file mode 100644
index 0000000000000..e40d32a2ed93d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define MAX_STACK_TRACE_DEPTH 64
+unsigned long entries[MAX_STACK_TRACE_DEPTH];
+#define SIZE_OF_ULONG (sizeof(unsigned long))
+
+SEC("iter/task")
+int dump_task_stack(struct bpf_iter__task *ctx)
+{
+ struct seq_file *seq = ctx->meta->seq;
+ struct task_struct *task = ctx->task;
+ long i, retlen;
+
+ if (task == (void *)0)
+ return 0;
+
+ retlen = bpf_get_task_stack(task, entries,
+ MAX_STACK_TRACE_DEPTH * SIZE_OF_ULONG, 0);
+ if (retlen < 0)
+ return 0;
+
+ BPF_SEQ_PRINTF(seq, "pid: %8u num_entries: %8u\n", task->pid,
+ retlen / SIZE_OF_ULONG);
+ for (i = 0; i < MAX_STACK_TRACE_DEPTH; i++) {
+ if (retlen > i * SIZE_OF_ULONG)
+ BPF_SEQ_PRINTF(seq, "[<0>] %pB\n", (void *)entries[i]);
+ }
+ BPF_SEQ_PRINTF(seq, "\n");
+
+ return 0;
+}
--
2.24.1

2020-06-29 19:18:32

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH v4 bpf-next 4/4] selftests/bpf: add bpf_iter test with bpf_get_task_stack()



On 6/29/20 9:56 AM, Song Liu wrote:
>
>
>> On Jun 29, 2020, at 8:06 AM, Yonghong Song <[email protected]> wrote:
>>
>>
>>
>> On 6/28/20 10:55 PM, Song Liu wrote:
>>> The new test is similar to other bpf_iter tests. It dumps all
>>> /proc/<pid>/stack to a seq_file. Here is some example output:
>>> pid: 2873 num_entries: 3
>>> [<0>] worker_thread+0xc6/0x380
>>> [<0>] kthread+0x135/0x150
>>> [<0>] ret_from_fork+0x22/0x30
>>> pid: 2874 num_entries: 9
>>> [<0>] __bpf_get_stack+0x15e/0x250
>>> [<0>] bpf_prog_22a400774977bb30_dump_task_stack+0x4a/0xb3c
>>> [<0>] bpf_iter_run_prog+0x81/0x170
>>> [<0>] __task_seq_show+0x58/0x80
>>> [<0>] bpf_seq_read+0x1c3/0x3b0
>>> [<0>] vfs_read+0x9e/0x170
>>> [<0>] ksys_read+0xa7/0xe0
>>> [<0>] do_syscall_64+0x4c/0xa0
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> Note: To print the output, it is necessary to modify the selftest.
>>
>> I do not know what this sentence means. It seems confusing
>> and probably not needed.
>
> It means current do_dummy_read() doesn't check/print the contents of the
> seq_file:
>
> /* not check contents, but ensure read() ends without error */
> while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
> ;

I see. Thanks. It could be great if the commit message is more
explicit about what 'modify' is.

>>
>>> Signed-off-by: Song Liu <[email protected]>
>>
>> Acked-by: Yonghong Song <[email protected]>
>
> Thanks!
>
> [...]
>

2020-06-29 19:28:49

by Song Liu

[permalink] [raw]
Subject: [PATCH v4 bpf-next 2/4] bpf: introduce helper bpf_get_task_stack()

Introduce helper bpf_get_task_stack(), which dumps stack trace of given
task. This is different to bpf_get_stack(), which gets stack track of
current task. One potential use case of bpf_get_task_stack() is to call
it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.

bpf_get_task_stack() uses stack_trace_save_tsk() instead of
get_perf_callchain() for kernel stack. The benefit of this choice is that
stack_trace_save_tsk() doesn't require changes in arch/. The downside of
using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
stack trace to unsigned long array. For 32-bit systems, we need to
translate it to u64 array.

Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 36 +++++++++++++++-
kernel/bpf/stackmap.c | 75 ++++++++++++++++++++++++++++++++--
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 2 +
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 36 +++++++++++++++-
7 files changed, 150 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3d2ade703a357..0cd7f6884c5cd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1627,6 +1627,7 @@ extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
extern const struct bpf_func_proto bpf_get_current_comm_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto;
extern const struct bpf_func_proto bpf_get_stack_proto;
+extern const struct bpf_func_proto bpf_get_task_stack_proto;
extern const struct bpf_func_proto bpf_sock_map_update_proto;
extern const struct bpf_func_proto bpf_sock_hash_update_proto;
extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0cb8ec9488168..54106ea667211 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3285,6 +3285,39 @@ union bpf_attr {
* Dynamically cast a *sk* pointer to a *udp6_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
+ *
+ * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
+ * Description
+ * Return a user or a kernel stack in bpf program provided buffer.
+ * To achieve this, the helper needs *task*, which is a valid
+ * pointer to struct task_struct. To store the stacktrace, the
+ * bpf program provides *buf* with a nonnegative *size*.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_USER_BUILD_ID**
+ * Collect buildid+offset instead of ips for user stack,
+ * only valid if **BPF_F_USER_STACK** is also specified.
+ *
+ * **bpf_get_task_stack**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
+ * to sufficient large buffer size. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=<new value>
+ * Return
+ * A non-negative value equal to or less than *size* on success,
+ * or a negative error in case of failure.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3427,7 +3460,8 @@ union bpf_attr {
FN(skc_to_tcp_sock), \
FN(skc_to_tcp_timewait_sock), \
FN(skc_to_tcp_request_sock), \
- FN(skc_to_udp6_sock),
+ FN(skc_to_udp6_sock), \
+ FN(get_task_stack),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 27dc9b1b08a52..0ba66b29ef227 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -348,6 +348,40 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
}
}

+static struct perf_callchain_entry *
+get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
+{
+ struct perf_callchain_entry *entry;
+ int rctx;
+
+ entry = get_callchain_entry(&rctx);
+
+ if (!entry)
+ return NULL;
+
+ entry->nr = init_nr +
+ stack_trace_save_tsk(task, (unsigned long *)(entry->ip + init_nr),
+ sysctl_perf_event_max_stack - init_nr, 0);
+
+ /* stack_trace_save_tsk() works on unsigned long array, while
+ * perf_callchain_entry uses u64 array. For 32-bit systems, it is
+ * necessary to fix this mismatch.
+ */
+ if (__BITS_PER_LONG != 64) {
+ unsigned long *from = (unsigned long *) entry->ip;
+ u64 *to = entry->ip;
+ int i;
+
+ /* copy data from the end to avoid using extra buffer */
+ for (i = entry->nr - 1; i >= (int)init_nr; i--)
+ to[i] = (u64)(from[i]);
+ }
+
+ put_callchain_entry(rctx);
+
+ return entry;
+}
+
BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
u64, flags)
{
@@ -448,8 +482,8 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
.arg3_type = ARG_ANYTHING,
};

-BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
- u64, flags)
+static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
+ void *buf, u32 size, u64 flags)
{
u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
@@ -471,13 +505,22 @@ BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
if (unlikely(size % elem_size))
goto clear;

+ /* cannot get valid user stack for task without user_mode regs */
+ if (task && user && !user_mode(regs))
+ goto err_fault;
+
num_elem = size / elem_size;
if (sysctl_perf_event_max_stack < num_elem)
init_nr = 0;
else
init_nr = sysctl_perf_event_max_stack - num_elem;
+
+ if (kernel && task)
+ trace = get_callchain_entry_for_task(task, init_nr);
+ else
trace = get_perf_callchain(regs, init_nr, kernel, user,
- sysctl_perf_event_max_stack, false, false);
+ sysctl_perf_event_max_stack,
+ false, false);
if (unlikely(!trace))
goto err_fault;

@@ -505,6 +548,12 @@ BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
return err;
}

+BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
+ u64, flags)
+{
+ return __bpf_get_stack(regs, NULL, buf, size, flags);
+}
+
const struct bpf_func_proto bpf_get_stack_proto = {
.func = bpf_get_stack,
.gpl_only = true,
@@ -515,6 +564,26 @@ const struct bpf_func_proto bpf_get_stack_proto = {
.arg4_type = ARG_ANYTHING,
};

+BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
+ u32, size, u64, flags)
+{
+ struct pt_regs *regs = task_pt_regs(task);
+
+ return __bpf_get_stack(regs, task, buf, size, flags);
+}
+
+static int bpf_get_task_stack_btf_ids[5];
+const struct bpf_func_proto bpf_get_task_stack_proto = {
+ .func = bpf_get_task_stack,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID,
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+ .btf_id = bpf_get_task_stack_btf_ids,
+};
+
/* Called from eBPF program */
static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
{
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7de98906ddf4a..b608185e1ffd5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4864,7 +4864,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
if (err)
return err;

- if (func_id == BPF_FUNC_get_stack && !env->prog->has_callchain_buf) {
+ if ((func_id == BPF_FUNC_get_stack ||
+ func_id == BPF_FUNC_get_task_stack) &&
+ !env->prog->has_callchain_buf) {
const char *err_str;

#ifdef CONFIG_PERF_EVENTS
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 5d59dda5f6615..977ba3b6f6c64 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1137,6 +1137,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_ringbuf_query_proto;
case BPF_FUNC_jiffies64:
return &bpf_jiffies64_proto;
+ case BPF_FUNC_get_task_stack:
+ return &bpf_get_task_stack_proto;
default:
return NULL;
}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 6bab40ff442e8..dd12e3b18aae3 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -426,6 +426,7 @@ class PrinterHelpers(Printer):
'struct tcp_timewait_sock',
'struct tcp_request_sock',
'struct udp6_sock',
+ 'struct task_struct',

'struct __sk_buff',
'struct sk_msg_md',
@@ -468,6 +469,7 @@ class PrinterHelpers(Printer):
'struct tcp_timewait_sock',
'struct tcp_request_sock',
'struct udp6_sock',
+ 'struct task_struct',
}
mapped_types = {
'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0cb8ec9488168..54106ea667211 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3285,6 +3285,39 @@ union bpf_attr {
* Dynamically cast a *sk* pointer to a *udp6_sock* pointer.
* Return
* *sk* if casting is valid, or NULL otherwise.
+ *
+ * long bpf_get_task_stack(struct task_struct *task, void *buf, u32 size, u64 flags)
+ * Description
+ * Return a user or a kernel stack in bpf program provided buffer.
+ * To achieve this, the helper needs *task*, which is a valid
+ * pointer to struct task_struct. To store the stacktrace, the
+ * bpf program provides *buf* with a nonnegative *size*.
+ *
+ * The last argument, *flags*, holds the number of stack frames to
+ * skip (from 0 to 255), masked with
+ * **BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ * the following flags:
+ *
+ * **BPF_F_USER_STACK**
+ * Collect a user space stack instead of a kernel stack.
+ * **BPF_F_USER_BUILD_ID**
+ * Collect buildid+offset instead of ips for user stack,
+ * only valid if **BPF_F_USER_STACK** is also specified.
+ *
+ * **bpf_get_task_stack**\ () can collect up to
+ * **PERF_MAX_STACK_DEPTH** both kernel and user frames, subject
+ * to sufficient large buffer size. Note that
+ * this limit can be controlled with the **sysctl** program, and
+ * that it should be manually increased in order to profile long
+ * user stacks (such as stacks for Java programs). To do so, use:
+ *
+ * ::
+ *
+ * # sysctl kernel.perf_event_max_stack=<new value>
+ * Return
+ * A non-negative value equal to or less than *size* on success,
+ * or a negative error in case of failure.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3427,7 +3460,8 @@ union bpf_attr {
FN(skc_to_tcp_sock), \
FN(skc_to_tcp_timewait_sock), \
FN(skc_to_tcp_request_sock), \
- FN(skc_to_udp6_sock),
+ FN(skc_to_udp6_sock), \
+ FN(get_task_stack),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
--
2.24.1

2020-06-29 20:21:52

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v4 bpf-next 0/4] bpf: introduce bpf_get_task_stack()

On Mon, Jun 29, 2020 at 11:54 AM Song Liu <[email protected]> wrote:
>
> This set introduces a new helper bpf_get_task_stack(). The primary use case
> is to dump all /proc/*/stack to seq_file via bpf_iter__task.
>
> A few different approaches have been explored and compared:
>
> 1. A simple wrapper around stack_trace_save_tsk(), as v1 [1].
>
> This approach introduces new syntax, which is different to existing
> helper bpf_get_stack(). Therefore, this is not ideal.
>
> 2. Extend get_perf_callchain() to support "task" as argument.
>
> This approach reuses most of bpf_get_stack(). However, extending
> get_perf_callchain() requires non-trivial changes to architecture
> specific code. Which is error prone.
>
> 3. Current (v2) approach, leverages most of existing bpf_get_stack(), and
> uses stack_trace_save_tsk() to handle architecture specific logic.
>
> [1] https://lore.kernel.org/netdev/[email protected]/
>
> Changes v3 => v4:
> 1. Simplify the selftests with bpf_iter.h. (Yonghong)
> 2. Add example output to commit log of 4/4. (Yonghong)
>
> Changes v2 => v3:
> 1. Rebase on top of bpf-next. (Yonghong)
> 2. Sanitize get_callchain_entry(). (Peter)
> 3. Use has_callchain_buf for bpf_get_task_stack. (Andrii)
> 4. Other small clean up. (Yonghong, Andrii).
>
> Changes v1 => v2:
> 1. Reuse most of bpf_get_stack() logic. (Andrii)
> 2. Fix unsigned long vs. u64 mismatch for 32-bit systems. (Yonghong)
> 3. Add %pB support in bpf_trace_printk(). (Daniel)
> 4. Fix buffer size to bytes.
>
> Song Liu (4):
> perf: expose get/put_callchain_entry()
> bpf: introduce helper bpf_get_task_stack()
> bpf: allow %pB in bpf_seq_printf() and bpf_trace_printk()
> selftests/bpf: add bpf_iter test with bpf_get_task_stack()
>
> include/linux/bpf.h | 1 +
> include/linux/perf_event.h | 2 +
> include/uapi/linux/bpf.h | 36 ++++++++-
> kernel/bpf/stackmap.c | 75 ++++++++++++++++++-
> kernel/bpf/verifier.c | 4 +-
> kernel/events/callchain.c | 13 ++--
> kernel/trace/bpf_trace.c | 12 ++-
> scripts/bpf_helpers_doc.py | 2 +
> tools/include/uapi/linux/bpf.h | 36 ++++++++-
> .../selftests/bpf/prog_tests/bpf_iter.c | 17 +++++
> .../selftests/bpf/progs/bpf_iter_task_stack.c | 37 +++++++++
> 11 files changed, 220 insertions(+), 15 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
>
> --
> 2.24.1

Thanks for working on this! This will enable a whole new set of tools
and applications.

Acked-by: Andrii Nakryiko <[email protected]>

2020-06-30 04:19:17

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v4 bpf-next 2/4] bpf: introduce helper bpf_get_task_stack()

On Sun, Jun 28, 2020 at 10:58 PM Song Liu <[email protected]> wrote:
>
> Introduce helper bpf_get_task_stack(), which dumps stack trace of given
> task. This is different to bpf_get_stack(), which gets stack track of
> current task. One potential use case of bpf_get_task_stack() is to call
> it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.
>
> bpf_get_task_stack() uses stack_trace_save_tsk() instead of
> get_perf_callchain() for kernel stack. The benefit of this choice is that
> stack_trace_save_tsk() doesn't require changes in arch/. The downside of
> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
> stack trace to unsigned long array. For 32-bit systems, we need to
> translate it to u64 array.
>
> Acked-by: Andrii Nakryiko <[email protected]>
> Signed-off-by: Song Liu <[email protected]>

It doesn't apply:
Applying: bpf: Introduce helper bpf_get_task_stack()
Using index info to reconstruct a base tree...
error: patch failed: kernel/bpf/stackmap.c:471
error: kernel/bpf/stackmap.c: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.
Patch failed at 0002 bpf: Introduce helper bpf_get_task_stack()

2020-06-30 06:15:30

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v4 bpf-next 2/4] bpf: introduce helper bpf_get_task_stack()



> On Jun 29, 2020, at 9:18 PM, Alexei Starovoitov <[email protected]> wrote:
>
> On Sun, Jun 28, 2020 at 10:58 PM Song Liu <[email protected]> wrote:
>>
>> Introduce helper bpf_get_task_stack(), which dumps stack trace of given
>> task. This is different to bpf_get_stack(), which gets stack track of
>> current task. One potential use case of bpf_get_task_stack() is to call
>> it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.
>>
>> bpf_get_task_stack() uses stack_trace_save_tsk() instead of
>> get_perf_callchain() for kernel stack. The benefit of this choice is that
>> stack_trace_save_tsk() doesn't require changes in arch/. The downside of
>> using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
>> stack trace to unsigned long array. For 32-bit systems, we need to
>> translate it to u64 array.
>>
>> Acked-by: Andrii Nakryiko <[email protected]>
>> Signed-off-by: Song Liu <[email protected]>
>
> It doesn't apply:
> Applying: bpf: Introduce helper bpf_get_task_stack()
> Using index info to reconstruct a base tree...
> error: patch failed: kernel/bpf/stackmap.c:471
> error: kernel/bpf/stackmap.c: patch does not apply
> error: Did you hand edit your patch?
> It does not apply to blobs recorded in its index.
> Patch failed at 0002 bpf: Introduce helper bpf_get_task_stack()

Hmm.. seems "git format-patch -b" (--ignore-space-change) breaks it:

# without -b, works fine

$ git format-patch HEAD~1
0001-bpf-introduce-helper-bpf_get_task_stack.patch
$ git reset --hard HEAD~1
HEAD is now at c385fe4fbd7bc perf: expose get/put_callchain_entry()
$ git am ./0001-bpf-introduce-helper-bpf_get_task_stack.patch
Applying: bpf: introduce helper bpf_get_task_stack()


# with -b, doesn't apply :(

$ git format-patch -b HEAD~1
0001-bpf-introduce-helper-bpf_get_task_stack.patch
$ git reset --hard HEAD~1
HEAD is now at c385fe4fbd7bc perf: expose get/put_callchain_entry()
$ git am ./0001-bpf-introduce-helper-bpf_get_task_stack.patch
Applying: bpf: introduce helper bpf_get_task_stack()
error: patch failed: kernel/bpf/stackmap.c:471
error: kernel/bpf/stackmap.c: patch does not apply
Patch failed at 0001 bpf: introduce helper bpf_get_task_stack()
hint: Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Let me see how to fix it...