Just one last go around I hope, fixed the preemption thing that Darrick
reported.
v9->v10:
- the kprobe dispather now requires us to re-enable preemption if we change the
ip ourselves, so do that.
v8->v9:
- rebased onto the bpf tree.
v7->v8:
- removed the _ASM_KPROBE_ERROR_INJECT since it was not needed.
v6->v7:
- moved the opt-in macro to bpf.h out of kprobes.h.
v5->v6:
- add BPF_ALLOW_ERROR_INJECTION() tagging for functions that will support this
feature. This way only functions that opt-in will be allowed to be
overridden.
- added a btrfs patch to allow error injection for open_ctree() so that the bpf
sample actually works.
v4->v5:
- disallow kprobe_override programs from being put in the prog map array so we
don't tail call into something we didn't check. This allows us to make the
normal path still fast without a bunch of percpu operations.
v3->v4:
- fix a build error found by kbuild test bot (I didn't wait long enough
apparently.)
- Added a warning message as per Daniels suggestion.
v2->v3:
- added a ->kprobe_override flag to bpf_prog.
- added some sanity checks to disallow attaching bpf progs that have
->kprobe_override set that aren't for ftrace kprobes.
- added the trace_kprobe_ftrace helper to check if the trace_event_call is a
ftrace kprobe.
- renamed bpf_kprobe_state to bpf_kprobe_override, fixed it so we only read this
value in the kprobe path, and thus only write to it if we're overriding or
clearing the override.
v1->v2:
- moved things around to make sure that bpf_override_return could really only be
used for an ftrace kprobe.
- killed the special return values from trace_call_bpf.
- renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
it was being called from an ftrace kprobe context.
- reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
- updated the test as per Alexei's review.
- Original message -
A lot of our error paths are not well tested because we have no good way of
injecting errors generically. Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.
With BPF we can add determinism to our error injection. We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test. This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.
Right now this only works on x86, but it would be simple enough to expand to
other architectures. Thanks,
Josef
From: Josef Bacik <[email protected]>
This allows us to do error injection with BPF for open_ctree.
Signed-off-by: Josef Bacik <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
---
fs/btrfs/disk-io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 10a2a579cc7f..02b5f5667754 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -30,6 +30,7 @@
#include <linux/ratelimit.h>
#include <linux/uuid.h>
#include <linux/semaphore.h>
+#include <linux/bpf.h>
#include <asm/unaligned.h>
#include "ctree.h"
#include "disk-io.h"
@@ -3123,6 +3124,7 @@ int open_ctree(struct super_block *sb,
goto fail_block_groups;
goto retry_root_backup;
}
+BPF_ALLOW_ERROR_INJECTION(open_ctree);
static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
{
--
2.7.5
From: Josef Bacik <[email protected]>
Error injection is sloppy and very ad-hoc. BPF could fill this niche
perfectly with it's kprobe functionality. We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations. Accomplish this with the bpf_override_funciton
helper. This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function. This gives us a nice
clean way to implement systematic error injection for all of our code
paths.
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
---
arch/Kconfig | 3 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/kprobes.h | 4 +++
arch/x86/include/asm/ptrace.h | 5 ++++
arch/x86/kernel/kprobes/ftrace.c | 14 +++++++++
include/linux/filter.h | 3 +-
include/linux/trace_events.h | 1 +
include/uapi/linux/bpf.h | 7 ++++-
kernel/bpf/core.c | 3 ++
kernel/bpf/verifier.c | 2 ++
kernel/events/core.c | 7 +++++
kernel/trace/Kconfig | 11 +++++++
kernel/trace/bpf_trace.c | 38 ++++++++++++++++++++++++
kernel/trace/trace_kprobe.c | 64 +++++++++++++++++++++++++++++++++++-----
kernel/trace/trace_probe.h | 12 ++++++++
15 files changed, 165 insertions(+), 10 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 400b9e1b2f27..d3f4aaf9cb7a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -196,6 +196,9 @@ config HAVE_OPTPROBES
config HAVE_KPROBES_ON_FTRACE
bool
+config HAVE_KPROBE_OVERRIDE
+ bool
+
config HAVE_NMI
bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8eed3f94bfc7..04d66e6fa447 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -154,6 +154,7 @@ config X86
select HAVE_KERNEL_XZ
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
+ select HAVE_KPROBE_OVERRIDE
select HAVE_KRETPROBES
select HAVE_KVM
select HAVE_LIVEPATCH if X86_64
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 9f2e3102e0bb..36abb23a7a35 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
void arch_remove_kprobe(struct kprobe *p);
asmlinkage void kretprobe_trampoline(void);
+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
+#endif
+
/* Architecture specific copy of original instruction*/
struct arch_specific_insn {
/* copy of the original instruction */
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 14131dd06b29..6de1fd3d0097 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -109,6 +109,11 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
return regs->ax;
}
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
+{
+ regs->ax = rc;
+}
+
/*
* user_mode(regs) determines whether a register set came from user
* mode. On x86_32, this is true if V8086 mode was enabled OR if the
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 8dc0161cec8f..1ea748d682fd 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
p->ainsn.boostable = false;
return 0;
}
+
+asmlinkage void override_func(void);
+asm(
+ ".type override_func, @function\n"
+ "override_func:\n"
+ " ret\n"
+ ".size override_func, .-override_func\n"
+);
+
+void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
+{
+ regs->ip = (unsigned long)&override_func;
+}
+NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 0062302e1285..5feb441d3dd9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -458,7 +458,8 @@ struct bpf_prog {
locked:1, /* Program image locked? */
gpl_compatible:1, /* Is filter GPL compatible? */
cb_access:1, /* Is control block accessed? */
- dst_needed:1; /* Do we need dst entry? */
+ dst_needed:1, /* Do we need dst entry? */
+ kprobe_override:1; /* Do we override a kprobe? */
enum bpf_prog_type type; /* Type of BPF program */
u32 len; /* Number of filter blocks */
u32 jited_len; /* Size of jited insns in bytes */
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index af44e7c2d577..5fea451f6e28 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -528,6 +528,7 @@ do { \
struct perf_event;
DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+DECLARE_PER_CPU(int, bpf_kprobe_override);
extern int perf_trace_init(struct perf_event *event);
extern void perf_trace_destroy(struct perf_event *event);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 80d62e88590c..595bda120cfb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -677,6 +677,10 @@ union bpf_attr {
* @buf: buf to fill
* @buf_size: size of the buf
* Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -736,7 +740,8 @@ union bpf_attr {
FN(xdp_adjust_meta), \
FN(perf_event_read_value), \
FN(perf_prog_read_value), \
- FN(getsockopt),
+ FN(getsockopt), \
+ FN(override_return),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b9f8686a84cf..fc5a8ab4239a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1320,6 +1320,9 @@ EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
bool bpf_prog_array_compatible(struct bpf_array *array,
const struct bpf_prog *fp)
{
+ if (fp->kprobe_override)
+ return false;
+
if (!array->owner_prog_type) {
/* There's no owner yet where we could check for
* compatibility.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7afa92e9b409..e807bda7fe29 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4413,6 +4413,8 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
prog->dst_needed = 1;
if (insn->imm == BPF_FUNC_get_prandom_u32)
bpf_user_rnd_init_once();
+ if (insn->imm == BPF_FUNC_override_return)
+ prog->kprobe_override = 1;
if (insn->imm == BPF_FUNC_tail_call) {
/* If we tail call into other programs, we
* cannot make any assumptions since they can
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 16beab4767e1..6e3862bbe9c2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8077,6 +8077,13 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
return -EINVAL;
}
+ /* Kprobe override only works for kprobes, not uprobes. */
+ if (prog->kprobe_override &&
+ !(event->tp_event->flags & TRACE_EVENT_FL_KPROBE)) {
+ bpf_prog_put(prog);
+ return -EINVAL;
+ }
+
if (is_tracepoint || is_syscall_tp) {
int off = trace_event_get_offsets(event->tp_event);
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index af7dad126c13..3e6fd580fe7f 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -529,6 +529,17 @@ config FUNCTION_PROFILER
If in doubt, say N.
+config BPF_KPROBE_OVERRIDE
+ bool "Enable BPF programs to override a kprobed function"
+ depends on BPF_EVENTS
+ depends on KPROBES_ON_FTRACE
+ depends on HAVE_KPROBE_OVERRIDE
+ depends on DYNAMIC_FTRACE_WITH_REGS
+ default n
+ help
+ Allows BPF to override the execution of a probed function and
+ set a different return value. This is used for error injection.
+
config FTRACE_MCOUNT_RECORD
def_bool y
depends on DYNAMIC_FTRACE
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 27d1f4ffa3de..e4bfdbc5a905 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -13,6 +13,10 @@
#include <linux/filter.h>
#include <linux/uaccess.h>
#include <linux/ctype.h>
+#include <linux/kprobes.h>
+#include <asm/kprobes.h>
+
+#include "trace_probe.h"
#include "trace.h"
u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
@@ -76,6 +80,29 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
}
EXPORT_SYMBOL_GPL(trace_call_bpf);
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
+{
+ __this_cpu_write(bpf_kprobe_override, 1);
+ regs_set_return_value(regs, rc);
+ arch_ftrace_kprobe_override_function(regs);
+ return 0;
+}
+#else
+BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
+{
+ return -EINVAL;
+}
+#endif
+
+static const struct bpf_func_proto bpf_override_return_proto = {
+ .func = bpf_override_return,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+};
+
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
{
int ret;
@@ -551,6 +578,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
return &bpf_get_stackid_proto;
case BPF_FUNC_perf_event_read_value:
return &bpf_perf_event_read_value_proto;
+ case BPF_FUNC_override_return:
+ return &bpf_override_return_proto;
default:
return tracing_func_proto(func_id);
}
@@ -766,6 +795,15 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
struct bpf_prog_array *new_array;
int ret = -EEXIST;
+ /*
+ * Kprobe override only works for ftrace based kprobes, and only if they
+ * are on the opt-in list.
+ */
+ if (prog->kprobe_override &&
+ (!trace_kprobe_ftrace(event->tp_event) ||
+ !trace_kprobe_error_injectable(event->tp_event)))
+ return -EINVAL;
+
mutex_lock(&bpf_event_mutex);
if (event->prog)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 492700c5fb4d..91f4b57dab82 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -42,6 +42,7 @@ struct trace_kprobe {
(offsetof(struct trace_kprobe, tp.args) + \
(sizeof(struct probe_arg) * (n)))
+DEFINE_PER_CPU(int, bpf_kprobe_override);
static nokprobe_inline bool trace_kprobe_is_return(struct trace_kprobe *tk)
{
@@ -87,6 +88,27 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
return nhit;
}
+int trace_kprobe_ftrace(struct trace_event_call *call)
+{
+ struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
+ return kprobe_ftrace(&tk->rp.kp);
+}
+
+int trace_kprobe_error_injectable(struct trace_event_call *call)
+{
+ struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
+ unsigned long addr;
+
+ if (tk->symbol) {
+ addr = (unsigned long)
+ kallsyms_lookup_name(trace_kprobe_symbol(tk));
+ addr += tk->rp.kp.offset;
+ } else {
+ addr = (unsigned long)tk->rp.kp.addr;
+ }
+ return within_kprobe_error_injection_list(addr);
+}
+
static int register_kprobe_event(struct trace_kprobe *tk);
static int unregister_kprobe_event(struct trace_kprobe *tk);
@@ -1170,7 +1192,7 @@ static int kretprobe_event_define_fields(struct trace_event_call *event_call)
#ifdef CONFIG_PERF_EVENTS
/* Kprobe profile handler */
-static void
+static int
kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
{
struct trace_event_call *call = &tk->tp.call;
@@ -1179,12 +1201,29 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
int size, __size, dsize;
int rctx;
- if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
- return;
+ if (bpf_prog_array_valid(call)) {
+ int ret;
+
+ ret = trace_call_bpf(call, regs);
+
+ /*
+ * We need to check and see if we modified the pc of the
+ * pt_regs, and if so clear the kprobe and return 1 so that we
+ * don't do the instruction skipping. Also reset our state so
+ * we are clean the next pass through.
+ */
+ if (__this_cpu_read(bpf_kprobe_override)) {
+ __this_cpu_write(bpf_kprobe_override, 0);
+ reset_current_kprobe();
+ return 1;
+ }
+ if (!ret)
+ return 0;
+ }
head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
- return;
+ return 0;
dsize = __get_data_size(&tk->tp, regs);
__size = sizeof(*entry) + tk->tp.size + dsize;
@@ -1193,13 +1232,14 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
entry = perf_trace_buf_alloc(size, NULL, &rctx);
if (!entry)
- return;
+ return 0;
entry->ip = (unsigned long)tk->rp.kp.addr;
memset(&entry[1], 0, dsize);
store_trace_args(sizeof(*entry), &tk->tp, regs, (u8 *)&entry[1], dsize);
perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
head, NULL);
+ return 0;
}
NOKPROBE_SYMBOL(kprobe_perf_func);
@@ -1275,16 +1315,24 @@ static int kprobe_register(struct trace_event_call *event,
static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
{
struct trace_kprobe *tk = container_of(kp, struct trace_kprobe, rp.kp);
+ int ret = 0;
raw_cpu_inc(*tk->nhit);
if (tk->tp.flags & TP_FLAG_TRACE)
kprobe_trace_func(tk, regs);
#ifdef CONFIG_PERF_EVENTS
- if (tk->tp.flags & TP_FLAG_PROFILE)
- kprobe_perf_func(tk, regs);
+ if (tk->tp.flags & TP_FLAG_PROFILE) {
+ ret = kprobe_perf_func(tk, regs);
+ /*
+ * The ftrace kprobe handler leaves it up to us to re-enable
+ * preemption here before returning if we've modified the ip.
+ */
+ if (ret)
+ preempt_enable_no_resched();
+ }
#endif
- return 0; /* We don't tweek kernel, so just return 0 */
+ return ret;
}
NOKPROBE_SYMBOL(kprobe_dispatcher);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index fb66e3eaa192..5e54d748c84c 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -252,6 +252,8 @@ struct symbol_cache;
unsigned long update_symbol_cache(struct symbol_cache *sc);
void free_symbol_cache(struct symbol_cache *sc);
struct symbol_cache *alloc_symbol_cache(const char *sym, long offset);
+int trace_kprobe_ftrace(struct trace_event_call *call);
+int trace_kprobe_error_injectable(struct trace_event_call *call);
#else
/* uprobes do not support symbol fetch methods */
#define fetch_symbol_u8 NULL
@@ -277,6 +279,16 @@ alloc_symbol_cache(const char *sym, long offset)
{
return NULL;
}
+
+static inline int trace_kprobe_ftrace(struct trace_event_call *call)
+{
+ return 0;
+}
+
+static inline int trace_kprobe_error_injectable(struct trace_event_call *call)
+{
+ return 0;
+}
#endif /* CONFIG_KPROBE_EVENTS */
struct probe_arg {
--
2.7.5
From: Josef Bacik <[email protected]>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
---
samples/bpf/Makefile | 4 ++++
samples/bpf/test_override_return.sh | 15 +++++++++++++++
samples/bpf/tracex7_kern.c | 16 ++++++++++++++++
samples/bpf/tracex7_user.c | 28 ++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 7 ++++++-
tools/testing/selftests/bpf/bpf_helpers.h | 3 ++-
6 files changed, 71 insertions(+), 2 deletions(-)
create mode 100755 samples/bpf/test_override_return.sh
create mode 100644 samples/bpf/tracex7_kern.c
create mode 100644 samples/bpf/tracex7_user.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index adeaa1302f34..4fb944a7ecf8 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -12,6 +12,7 @@ hostprogs-y += tracex3
hostprogs-y += tracex4
hostprogs-y += tracex5
hostprogs-y += tracex6
+hostprogs-y += tracex7
hostprogs-y += test_probe_write_user
hostprogs-y += trace_output
hostprogs-y += lathist
@@ -58,6 +59,7 @@ tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o
load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o
test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
@@ -101,6 +103,7 @@ always += tracex3_kern.o
always += tracex4_kern.o
always += tracex5_kern.o
always += tracex6_kern.o
+always += tracex7_kern.o
always += sock_flags_kern.o
always += test_probe_write_user_kern.o
always += trace_output_kern.o
@@ -155,6 +158,7 @@ HOSTLOADLIBES_tracex3 += -lelf
HOSTLOADLIBES_tracex4 += -lelf -lrt
HOSTLOADLIBES_tracex5 += -lelf
HOSTLOADLIBES_tracex6 += -lelf
+HOSTLOADLIBES_tracex7 += -lelf
HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
HOSTLOADLIBES_load_sock_ops += -lelf
HOSTLOADLIBES_test_probe_write_user += -lelf
diff --git a/samples/bpf/test_override_return.sh b/samples/bpf/test_override_return.sh
new file mode 100755
index 000000000000..e68b9ee6814b
--- /dev/null
+++ b/samples/bpf/test_override_return.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+rm -f testfile.img
+dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
+DEVICE=$(losetup --show -f testfile.img)
+mkfs.btrfs -f $DEVICE
+mkdir tmpmnt
+./tracex7 $DEVICE
+if [ $? -eq 0 ]
+then
+ echo "SUCCESS!"
+else
+ echo "FAILED!"
+fi
+losetup -d $DEVICE
diff --git a/samples/bpf/tracex7_kern.c b/samples/bpf/tracex7_kern.c
new file mode 100644
index 000000000000..1ab308a43e0f
--- /dev/null
+++ b/samples/bpf/tracex7_kern.c
@@ -0,0 +1,16 @@
+#include <uapi/linux/ptrace.h>
+#include <uapi/linux/bpf.h>
+#include <linux/version.h>
+#include "bpf_helpers.h"
+
+SEC("kprobe/open_ctree")
+int bpf_prog1(struct pt_regs *ctx)
+{
+ unsigned long rc = -12;
+
+ bpf_override_return(ctx, rc);
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex7_user.c b/samples/bpf/tracex7_user.c
new file mode 100644
index 000000000000..8a52ac492e8b
--- /dev/null
+++ b/samples/bpf/tracex7_user.c
@@ -0,0 +1,28 @@
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+int main(int argc, char **argv)
+{
+ FILE *f;
+ char filename[256];
+ char command[256];
+ int ret;
+
+ snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+
+ snprintf(command, 256, "mount %s tmpmnt/", argv[1]);
+ f = popen(command, "r");
+ ret = pclose(f);
+
+ return ret ? 0 : 1;
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4c223ab30293..cf446c25c0ec 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -677,6 +677,10 @@ union bpf_attr {
* @buf: buf to fill
* @buf_size: size of the buf
* Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ * @pt_regs: pointer to struct pt_regs
+ * @rc: the return value to set
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -736,7 +740,8 @@ union bpf_attr {
FN(xdp_adjust_meta), \
FN(perf_event_read_value), \
FN(perf_prog_read_value), \
- FN(getsockopt),
+ FN(getsockopt), \
+ FN(override_return),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index fd9a17fa8a8b..33cb00e46c49 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -82,7 +82,8 @@ static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
unsigned int buf_size) =
(void *) BPF_FUNC_perf_prog_read_value;
-
+static int (*bpf_override_return)(void *ctx, unsigned long rc) =
+ (void *) BPF_FUNC_override_return;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
--
2.7.5
From: Josef Bacik <[email protected]>
This was instrumental in reproducing a space cache bug.
Signed-off-by: Josef Bacik <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
---
fs/btrfs/free-space-cache.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4426d1c73e50..fb1382893bfc 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -22,6 +22,7 @@
#include <linux/slab.h>
#include <linux/math64.h>
#include <linux/ratelimit.h>
+#include <linux/bpf.h>
#include "ctree.h"
#include "free-space-cache.h"
#include "transaction.h"
@@ -332,6 +333,7 @@ static int io_ctl_init(struct btrfs_io_ctl *io_ctl, struct inode *inode,
return 0;
}
+BPF_ALLOW_ERROR_INJECTION(io_ctl_init);
static void io_ctl_free(struct btrfs_io_ctl *io_ctl)
{
--
2.7.5
From: Josef Bacik <[email protected]>
Using BPF we can override kprob'ed functions and return arbitrary
values. Obviously this can be a bit unsafe, so make this feature opt-in
for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
order to give BPF access to that function for error injection purposes.
Signed-off-by: Josef Bacik <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
---
include/asm-generic/vmlinux.lds.h | 10 +++
include/linux/bpf.h | 11 +++
include/linux/kprobes.h | 1 +
include/linux/module.h | 5 ++
kernel/kprobes.c | 163 ++++++++++++++++++++++++++++++++++++++
kernel/module.c | 6 +-
6 files changed, 195 insertions(+), 1 deletion(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index ee8b707d9fa9..a2e8582d094a 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -136,6 +136,15 @@
#define KPROBE_BLACKLIST()
#endif
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+#define ERROR_INJECT_LIST() . = ALIGN(8); \
+ VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .; \
+ KEEP(*(_kprobe_error_inject_list)) \
+ VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) = .;
+#else
+#define ERROR_INJECT_LIST()
+#endif
+
#ifdef CONFIG_EVENT_TRACING
#define FTRACE_EVENTS() . = ALIGN(8); \
VMLINUX_SYMBOL(__start_ftrace_events) = .; \
@@ -564,6 +573,7 @@
FTRACE_EVENTS() \
TRACE_SYSCALLS() \
KPROBE_BLACKLIST() \
+ ERROR_INJECT_LIST() \
MEM_DISCARD(init.rodata) \
CLK_OF_TABLES() \
RESERVEDMEM_OF_TABLES() \
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e55e4255a210..7f4d2a953173 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -576,4 +576,15 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+#define BPF_ALLOW_ERROR_INJECTION(fname) \
+static unsigned long __used \
+ __attribute__((__section__("_kprobe_error_inject_list"))) \
+ _eil_addr_##fname = (unsigned long)fname;
+#else
+#define BPF_ALLOW_ERROR_INJECTION(fname)
+#endif
+#endif
+
#endif /* _LINUX_BPF_H */
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9440a2fc8893..963fd364f3d6 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -271,6 +271,7 @@ extern bool arch_kprobe_on_func_entry(unsigned long offset);
extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
extern bool within_kprobe_blacklist(unsigned long addr);
+extern bool within_kprobe_error_injection_list(unsigned long addr);
struct kprobe_insn_cache {
struct mutex mutex;
diff --git a/include/linux/module.h b/include/linux/module.h
index c69b49abe877..548fa09fa806 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -475,6 +475,11 @@ struct module {
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
+
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+ unsigned int num_kprobe_ei_funcs;
+ unsigned long *kprobe_ei_funcs;
+#endif
} ____cacheline_aligned __randomize_layout;
#ifndef MODULE_ARCH_INIT
#define MODULE_ARCH_INIT {}
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index da2ccf142358..b4aab48ad258 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -83,6 +83,16 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
return &(kretprobe_table_locks[hash].lock);
}
+/* List of symbols that can be overriden for error injection. */
+static LIST_HEAD(kprobe_error_injection_list);
+static DEFINE_MUTEX(kprobe_ei_mutex);
+struct kprobe_ei_entry {
+ struct list_head list;
+ unsigned long start_addr;
+ unsigned long end_addr;
+ void *priv;
+};
+
/* Blacklist -- list of struct kprobe_blacklist_entry */
static LIST_HEAD(kprobe_blacklist);
@@ -1394,6 +1404,17 @@ bool within_kprobe_blacklist(unsigned long addr)
return false;
}
+bool within_kprobe_error_injection_list(unsigned long addr)
+{
+ struct kprobe_ei_entry *ent;
+
+ list_for_each_entry(ent, &kprobe_error_injection_list, list) {
+ if (addr >= ent->start_addr && addr < ent->end_addr)
+ return true;
+ }
+ return false;
+}
+
/*
* If we have a symbol_name argument, look it up and add the offset field
* to it. This way, we can specify a relative address to a symbol.
@@ -2168,6 +2189,86 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
return 0;
}
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+/* Markers of the _kprobe_error_inject_list section */
+extern unsigned long __start_kprobe_error_inject_list[];
+extern unsigned long __stop_kprobe_error_inject_list[];
+
+/*
+ * Lookup and populate the kprobe_error_injection_list.
+ *
+ * For safety reasons we only allow certain functions to be overriden with
+ * bpf_error_injection, so we need to populate the list of the symbols that have
+ * been marked as safe for overriding.
+ */
+static void populate_kprobe_error_injection_list(unsigned long *start,
+ unsigned long *end,
+ void *priv)
+{
+ unsigned long *iter;
+ struct kprobe_ei_entry *ent;
+ unsigned long entry, offset = 0, size = 0;
+
+ mutex_lock(&kprobe_ei_mutex);
+ for (iter = start; iter < end; iter++) {
+ entry = arch_deref_entry_point((void *)*iter);
+
+ if (!kernel_text_address(entry) ||
+ !kallsyms_lookup_size_offset(entry, &size, &offset)) {
+ pr_err("Failed to find error inject entry at %p\n",
+ (void *)entry);
+ continue;
+ }
+
+ ent = kmalloc(sizeof(*ent), GFP_KERNEL);
+ if (!ent)
+ break;
+ ent->start_addr = entry;
+ ent->end_addr = entry + size;
+ ent->priv = priv;
+ INIT_LIST_HEAD(&ent->list);
+ list_add_tail(&ent->list, &kprobe_error_injection_list);
+ }
+ mutex_unlock(&kprobe_ei_mutex);
+}
+
+static void __init populate_kernel_kprobe_ei_list(void)
+{
+ populate_kprobe_error_injection_list(__start_kprobe_error_inject_list,
+ __stop_kprobe_error_inject_list,
+ NULL);
+}
+
+static void module_load_kprobe_ei_list(struct module *mod)
+{
+ if (!mod->num_kprobe_ei_funcs)
+ return;
+ populate_kprobe_error_injection_list(mod->kprobe_ei_funcs,
+ mod->kprobe_ei_funcs +
+ mod->num_kprobe_ei_funcs, mod);
+}
+
+static void module_unload_kprobe_ei_list(struct module *mod)
+{
+ struct kprobe_ei_entry *ent, *n;
+ if (!mod->num_kprobe_ei_funcs)
+ return;
+
+ mutex_lock(&kprobe_ei_mutex);
+ list_for_each_entry_safe(ent, n, &kprobe_error_injection_list, list) {
+ if (ent->priv == mod) {
+ list_del_init(&ent->list);
+ kfree(ent);
+ }
+ }
+ mutex_unlock(&kprobe_ei_mutex);
+}
+#else
+static inline void __init populate_kernel_kprobe_ei_list(void) {}
+static inline void module_load_kprobe_ei_list(struct module *m) {}
+static inline void module_unload_kprobe_ei_list(struct module *m) {}
+#endif
+
/* Module notifier call back, checking kprobes on the module */
static int kprobes_module_callback(struct notifier_block *nb,
unsigned long val, void *data)
@@ -2178,6 +2279,11 @@ static int kprobes_module_callback(struct notifier_block *nb,
unsigned int i;
int checkcore = (val == MODULE_STATE_GOING);
+ if (val == MODULE_STATE_COMING)
+ module_load_kprobe_ei_list(mod);
+ else if (val == MODULE_STATE_GOING)
+ module_unload_kprobe_ei_list(mod);
+
if (val != MODULE_STATE_GOING && val != MODULE_STATE_LIVE)
return NOTIFY_DONE;
@@ -2240,6 +2346,8 @@ static int __init init_kprobes(void)
pr_err("Please take care of using kprobes.\n");
}
+ populate_kernel_kprobe_ei_list();
+
if (kretprobe_blacklist_size) {
/* lookup the function address from its name */
for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
@@ -2407,6 +2515,56 @@ static const struct file_operations debugfs_kprobe_blacklist_ops = {
.release = seq_release,
};
+/*
+ * kprobes/error_injection_list -- shows which functions can be overriden for
+ * error injection.
+ * */
+static void *kprobe_ei_seq_start(struct seq_file *m, loff_t *pos)
+{
+ mutex_lock(&kprobe_ei_mutex);
+ return seq_list_start(&kprobe_error_injection_list, *pos);
+}
+
+static void kprobe_ei_seq_stop(struct seq_file *m, void *v)
+{
+ mutex_unlock(&kprobe_ei_mutex);
+}
+
+static void *kprobe_ei_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ return seq_list_next(v, &kprobe_error_injection_list, pos);
+}
+
+static int kprobe_ei_seq_show(struct seq_file *m, void *v)
+{
+ char buffer[KSYM_SYMBOL_LEN];
+ struct kprobe_ei_entry *ent =
+ list_entry(v, struct kprobe_ei_entry, list);
+
+ sprint_symbol(buffer, ent->start_addr);
+ seq_printf(m, "%s\n", buffer);
+ return 0;
+}
+
+static const struct seq_operations kprobe_ei_seq_ops = {
+ .start = kprobe_ei_seq_start,
+ .next = kprobe_ei_seq_next,
+ .stop = kprobe_ei_seq_stop,
+ .show = kprobe_ei_seq_show,
+};
+
+static int kprobe_ei_open(struct inode *inode, struct file *filp)
+{
+ return seq_open(filp, &kprobe_ei_seq_ops);
+}
+
+static const struct file_operations debugfs_kprobe_ei_ops = {
+ .open = kprobe_ei_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
static void arm_all_kprobes(void)
{
struct hlist_head *head;
@@ -2548,6 +2706,11 @@ static int __init debugfs_kprobe_init(void)
if (!file)
goto error;
+ file = debugfs_create_file("error_injection_list", 0444, dir, NULL,
+ &debugfs_kprobe_ei_ops);
+ if (!file)
+ goto error;
+
return 0;
error:
diff --git a/kernel/module.c b/kernel/module.c
index dea01ac9cb74..bd695bfdc5c4 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3118,7 +3118,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
sizeof(*mod->ftrace_callsites),
&mod->num_ftrace_callsites);
#endif
-
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+ mod->kprobe_ei_funcs = section_objs(info, "_kprobe_error_inject_list",
+ sizeof(*mod->kprobe_ei_funcs),
+ &mod->num_kprobe_ei_funcs);
+#endif
mod->extable = section_objs(info, "__ex_table",
sizeof(*mod->extable), &mod->num_exentries);
--
2.7.5
On 12/15/17 11:12 AM, Josef Bacik wrote:
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
> +{
> + __this_cpu_write(bpf_kprobe_override, 1);
> + regs_set_return_value(regs, rc);
> + arch_ftrace_kprobe_override_function(regs);
> + return 0;
> +}
since you're doing a respin can you adopt the change I did to make
this helper fail at load time if that #config is not set
instead of runtime?
Also how big is the v9-v10 change ?
May be do it as separate patch, since previous set already sitting
in bpf-next and there are patches on top?
On 12/15/2017 09:34 PM, Alexei Starovoitov wrote:
[...]
> Also how big is the v9-v10 change ?
> May be do it as separate patch, since previous set already sitting
> in bpf-next and there are patches on top?
+1
On Fri, 15 Dec 2017 14:12:52 -0500
Josef Bacik <[email protected]> wrote:
> From: Josef Bacik <[email protected]>
>
> Using BPF we can override kprob'ed functions and return arbitrary
> values. Obviously this can be a bit unsafe, so make this feature opt-in
> for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
> order to give BPF access to that function for error injection purposes.
>
NAK. I'm very confused. What the reason to add this feature is implemented
in kernel/kprobes.c? It is seemed within an usual "usage" of kprobes.
I recommend you to implement this somewhere else... like
kernel/error_injection.c, or kernel/module.c.
More precisely list up the reasons why,
- This is just for providing an API to check the address within an
address-range list inside kmodule (not related to kprobes).
- There is no check in kprobes to modified address by using the API.
(yes, that will cause a big overhead...)
- This can mislead user NOT to change the instruction pointer from
the kprobes except for that list.
- If user intends to insert a piece of code (like livepatch) in a
function, they do NOT think it is an "error injection".
- Or if they find this API, and "what?? I can not change instruction
pointer by kprobes? but I can." and report it a bug on lkml...
So, I don't like to see this in kprobes.c. It is better to make another
layer to do this.
Thank you,
> Signed-off-by: Josef Bacik <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
> ---
> include/asm-generic/vmlinux.lds.h | 10 +++
> include/linux/bpf.h | 11 +++
> include/linux/kprobes.h | 1 +
> include/linux/module.h | 5 ++
> kernel/kprobes.c | 163 ++++++++++++++++++++++++++++++++++++++
> kernel/module.c | 6 +-
> 6 files changed, 195 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index ee8b707d9fa9..a2e8582d094a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -136,6 +136,15 @@
> #define KPROBE_BLACKLIST()
> #endif
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define ERROR_INJECT_LIST() . = ALIGN(8); \
> + VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .; \
> + KEEP(*(_kprobe_error_inject_list)) \
> + VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) = .;
> +#else
> +#define ERROR_INJECT_LIST()
> +#endif
> +
> #ifdef CONFIG_EVENT_TRACING
> #define FTRACE_EVENTS() . = ALIGN(8); \
> VMLINUX_SYMBOL(__start_ftrace_events) = .; \
> @@ -564,6 +573,7 @@
> FTRACE_EVENTS() \
> TRACE_SYSCALLS() \
> KPROBE_BLACKLIST() \
> + ERROR_INJECT_LIST() \
> MEM_DISCARD(init.rodata) \
> CLK_OF_TABLES() \
> RESERVEDMEM_OF_TABLES() \
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e55e4255a210..7f4d2a953173 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -576,4 +576,15 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
> void bpf_user_rnd_init_once(void);
> u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>
> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define BPF_ALLOW_ERROR_INJECTION(fname) \
> +static unsigned long __used \
> + __attribute__((__section__("_kprobe_error_inject_list"))) \
> + _eil_addr_##fname = (unsigned long)fname;
> +#else
> +#define BPF_ALLOW_ERROR_INJECTION(fname)
> +#endif
> +#endif
> +
> #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9440a2fc8893..963fd364f3d6 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -271,6 +271,7 @@ extern bool arch_kprobe_on_func_entry(unsigned long offset);
> extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
>
> extern bool within_kprobe_blacklist(unsigned long addr);
> +extern bool within_kprobe_error_injection_list(unsigned long addr);
>
> struct kprobe_insn_cache {
> struct mutex mutex;
> diff --git a/include/linux/module.h b/include/linux/module.h
> index c69b49abe877..548fa09fa806 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -475,6 +475,11 @@ struct module {
> ctor_fn_t *ctors;
> unsigned int num_ctors;
> #endif
> +
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + unsigned int num_kprobe_ei_funcs;
> + unsigned long *kprobe_ei_funcs;
> +#endif
> } ____cacheline_aligned __randomize_layout;
> #ifndef MODULE_ARCH_INIT
> #define MODULE_ARCH_INIT {}
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index da2ccf142358..b4aab48ad258 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -83,6 +83,16 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
> return &(kretprobe_table_locks[hash].lock);
> }
>
> +/* List of symbols that can be overriden for error injection. */
> +static LIST_HEAD(kprobe_error_injection_list);
> +static DEFINE_MUTEX(kprobe_ei_mutex);
> +struct kprobe_ei_entry {
> + struct list_head list;
> + unsigned long start_addr;
> + unsigned long end_addr;
> + void *priv;
> +};
> +
> /* Blacklist -- list of struct kprobe_blacklist_entry */
> static LIST_HEAD(kprobe_blacklist);
>
> @@ -1394,6 +1404,17 @@ bool within_kprobe_blacklist(unsigned long addr)
> return false;
> }
>
> +bool within_kprobe_error_injection_list(unsigned long addr)
> +{
> + struct kprobe_ei_entry *ent;
> +
> + list_for_each_entry(ent, &kprobe_error_injection_list, list) {
> + if (addr >= ent->start_addr && addr < ent->end_addr)
> + return true;
> + }
> + return false;
> +}
> +
> /*
> * If we have a symbol_name argument, look it up and add the offset field
> * to it. This way, we can specify a relative address to a symbol.
> @@ -2168,6 +2189,86 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> return 0;
> }
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +/* Markers of the _kprobe_error_inject_list section */
> +extern unsigned long __start_kprobe_error_inject_list[];
> +extern unsigned long __stop_kprobe_error_inject_list[];
> +
> +/*
> + * Lookup and populate the kprobe_error_injection_list.
> + *
> + * For safety reasons we only allow certain functions to be overriden with
> + * bpf_error_injection, so we need to populate the list of the symbols that have
> + * been marked as safe for overriding.
> + */
> +static void populate_kprobe_error_injection_list(unsigned long *start,
> + unsigned long *end,
> + void *priv)
> +{
> + unsigned long *iter;
> + struct kprobe_ei_entry *ent;
> + unsigned long entry, offset = 0, size = 0;
> +
> + mutex_lock(&kprobe_ei_mutex);
> + for (iter = start; iter < end; iter++) {
> + entry = arch_deref_entry_point((void *)*iter);
> +
> + if (!kernel_text_address(entry) ||
> + !kallsyms_lookup_size_offset(entry, &size, &offset)) {
> + pr_err("Failed to find error inject entry at %p\n",
> + (void *)entry);
> + continue;
> + }
> +
> + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> + if (!ent)
> + break;
> + ent->start_addr = entry;
> + ent->end_addr = entry + size;
> + ent->priv = priv;
> + INIT_LIST_HEAD(&ent->list);
> + list_add_tail(&ent->list, &kprobe_error_injection_list);
> + }
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +
> +static void __init populate_kernel_kprobe_ei_list(void)
> +{
> + populate_kprobe_error_injection_list(__start_kprobe_error_inject_list,
> + __stop_kprobe_error_inject_list,
> + NULL);
> +}
> +
> +static void module_load_kprobe_ei_list(struct module *mod)
> +{
> + if (!mod->num_kprobe_ei_funcs)
> + return;
> + populate_kprobe_error_injection_list(mod->kprobe_ei_funcs,
> + mod->kprobe_ei_funcs +
> + mod->num_kprobe_ei_funcs, mod);
> +}
> +
> +static void module_unload_kprobe_ei_list(struct module *mod)
> +{
> + struct kprobe_ei_entry *ent, *n;
> + if (!mod->num_kprobe_ei_funcs)
> + return;
> +
> + mutex_lock(&kprobe_ei_mutex);
> + list_for_each_entry_safe(ent, n, &kprobe_error_injection_list, list) {
> + if (ent->priv == mod) {
> + list_del_init(&ent->list);
> + kfree(ent);
> + }
> + }
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +#else
> +static inline void __init populate_kernel_kprobe_ei_list(void) {}
> +static inline void module_load_kprobe_ei_list(struct module *m) {}
> +static inline void module_unload_kprobe_ei_list(struct module *m) {}
> +#endif
> +
> /* Module notifier call back, checking kprobes on the module */
> static int kprobes_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -2178,6 +2279,11 @@ static int kprobes_module_callback(struct notifier_block *nb,
> unsigned int i;
> int checkcore = (val == MODULE_STATE_GOING);
>
> + if (val == MODULE_STATE_COMING)
> + module_load_kprobe_ei_list(mod);
> + else if (val == MODULE_STATE_GOING)
> + module_unload_kprobe_ei_list(mod);
> +
> if (val != MODULE_STATE_GOING && val != MODULE_STATE_LIVE)
> return NOTIFY_DONE;
>
> @@ -2240,6 +2346,8 @@ static int __init init_kprobes(void)
> pr_err("Please take care of using kprobes.\n");
> }
>
> + populate_kernel_kprobe_ei_list();
> +
> if (kretprobe_blacklist_size) {
> /* lookup the function address from its name */
> for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
> @@ -2407,6 +2515,56 @@ static const struct file_operations debugfs_kprobe_blacklist_ops = {
> .release = seq_release,
> };
>
> +/*
> + * kprobes/error_injection_list -- shows which functions can be overriden for
> + * error injection.
> + * */
> +static void *kprobe_ei_seq_start(struct seq_file *m, loff_t *pos)
> +{
> + mutex_lock(&kprobe_ei_mutex);
> + return seq_list_start(&kprobe_error_injection_list, *pos);
> +}
> +
> +static void kprobe_ei_seq_stop(struct seq_file *m, void *v)
> +{
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +
> +static void *kprobe_ei_seq_next(struct seq_file *m, void *v, loff_t *pos)
> +{
> + return seq_list_next(v, &kprobe_error_injection_list, pos);
> +}
> +
> +static int kprobe_ei_seq_show(struct seq_file *m, void *v)
> +{
> + char buffer[KSYM_SYMBOL_LEN];
> + struct kprobe_ei_entry *ent =
> + list_entry(v, struct kprobe_ei_entry, list);
> +
> + sprint_symbol(buffer, ent->start_addr);
> + seq_printf(m, "%s\n", buffer);
> + return 0;
> +}
> +
> +static const struct seq_operations kprobe_ei_seq_ops = {
> + .start = kprobe_ei_seq_start,
> + .next = kprobe_ei_seq_next,
> + .stop = kprobe_ei_seq_stop,
> + .show = kprobe_ei_seq_show,
> +};
> +
> +static int kprobe_ei_open(struct inode *inode, struct file *filp)
> +{
> + return seq_open(filp, &kprobe_ei_seq_ops);
> +}
> +
> +static const struct file_operations debugfs_kprobe_ei_ops = {
> + .open = kprobe_ei_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release,
> +};
> +
> static void arm_all_kprobes(void)
> {
> struct hlist_head *head;
> @@ -2548,6 +2706,11 @@ static int __init debugfs_kprobe_init(void)
> if (!file)
> goto error;
>
> + file = debugfs_create_file("error_injection_list", 0444, dir, NULL,
> + &debugfs_kprobe_ei_ops);
> + if (!file)
> + goto error;
> +
> return 0;
>
> error:
> diff --git a/kernel/module.c b/kernel/module.c
> index dea01ac9cb74..bd695bfdc5c4 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3118,7 +3118,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> sizeof(*mod->ftrace_callsites),
> &mod->num_ftrace_callsites);
> #endif
> -
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + mod->kprobe_ei_funcs = section_objs(info, "_kprobe_error_inject_list",
> + sizeof(*mod->kprobe_ei_funcs),
> + &mod->num_kprobe_ei_funcs);
> +#endif
> mod->extable = section_objs(info, "__ex_table",
> sizeof(*mod->extable), &mod->num_exentries);
>
> --
> 2.7.5
>
--
Masami Hiramatsu <[email protected]>
On Fri, 15 Dec 2017 14:12:54 -0500
Josef Bacik <[email protected]> wrote:
> From: Josef Bacik <[email protected]>
>
> Error injection is sloppy and very ad-hoc. BPF could fill this niche
> perfectly with it's kprobe functionality. We could make sure errors are
> only triggered in specific call chains that we care about with very
> specific situations. Accomplish this with the bpf_override_funciton
> helper. This will modify the probe'd callers return value to the
> specified value and set the PC to an override function that simply
> returns, bypassing the originally probed function. This gives us a nice
> clean way to implement systematic error injection for all of our code
> paths.
OK, got it. I think the error_injectable function list should be defined
in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
the "safeness".
[...]
> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> index 8dc0161cec8f..1ea748d682fd 100644
> --- a/arch/x86/kernel/kprobes/ftrace.c
> +++ b/arch/x86/kernel/kprobes/ftrace.c
> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
> p->ainsn.boostable = false;
> return 0;
> }
> +
> +asmlinkage void override_func(void);
> +asm(
> + ".type override_func, @function\n"
> + "override_func:\n"
> + " ret\n"
> + ".size override_func, .-override_func\n"
> +);
> +
> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
> +{
> + regs->ip = (unsigned long)&override_func;
> +}
> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
Calling this as "override_function" is meaningless. This is a function
which just return. So I think combination of just_return_func() and
arch_bpf_override_func_just_return() will be better.
Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
dependent implementation of kprobes, not bpf.
Hmm, arch/x86/net/bpf_jit_comp.c will be better place?
(why don't we have arch/x86/kernel/bpf.c?)
[..]
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 492700c5fb4d..91f4b57dab82 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -42,6 +42,7 @@ struct trace_kprobe {
> (offsetof(struct trace_kprobe, tp.args) + \
> (sizeof(struct probe_arg) * (n)))
>
> +DEFINE_PER_CPU(int, bpf_kprobe_override);
>
> static nokprobe_inline bool trace_kprobe_is_return(struct trace_kprobe *tk)
> {
> @@ -87,6 +88,27 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
> return nhit;
> }
>
> +int trace_kprobe_ftrace(struct trace_event_call *call)
> +{
> + struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> + return kprobe_ftrace(&tk->rp.kp);
> +}
> +
> +int trace_kprobe_error_injectable(struct trace_event_call *call)
> +{
> + struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> + unsigned long addr;
> +
> + if (tk->symbol) {
> + addr = (unsigned long)
> + kallsyms_lookup_name(trace_kprobe_symbol(tk));
> + addr += tk->rp.kp.offset;
If the tk is already registered, you don't need to get address,
you can use kp.addr. Anyway, kprobe_ftrace() also requires the
kprobe already registered.
> + } else {
> + addr = (unsigned long)tk->rp.kp.addr;
> + }
> + return within_kprobe_error_injection_list(addr);
> +}
> +
> static int register_kprobe_event(struct trace_kprobe *tk);
> static int unregister_kprobe_event(struct trace_kprobe *tk);
>
> @@ -1170,7 +1192,7 @@ static int kretprobe_event_define_fields(struct trace_event_call *event_call)
> #ifdef CONFIG_PERF_EVENTS
>
> /* Kprobe profile handler */
> -static void
> +static int
> kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
> {
> struct trace_event_call *call = &tk->tp.call;
> @@ -1179,12 +1201,29 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
> int size, __size, dsize;
> int rctx;
>
> - if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> - return;
> + if (bpf_prog_array_valid(call)) {
> + int ret;
> +
> + ret = trace_call_bpf(call, regs);
> +
> + /*
> + * We need to check and see if we modified the pc of the
> + * pt_regs, and if so clear the kprobe and return 1 so that we
> + * don't do the instruction skipping. Also reset our state so
> + * we are clean the next pass through.
> + */
> + if (__this_cpu_read(bpf_kprobe_override)) {
> + __this_cpu_write(bpf_kprobe_override, 0);
> + reset_current_kprobe();
OK, I will fix this issue(reset kprobe and preempt-enable) by removing
jprobe soon.
(currently waiting for removing {tcp,sctp,dccp}_probe code, which are
only users of jprobe in the kernel)
Thank you,
> + return 1;
> + }
> + if (!ret)
> + return 0;
> + }
>
> head = this_cpu_ptr(call->perf_events);
> if (hlist_empty(head))
> - return;
> + return 0;
>
> dsize = __get_data_size(&tk->tp, regs);
> __size = sizeof(*entry) + tk->tp.size + dsize;
> @@ -1193,13 +1232,14 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>
> entry = perf_trace_buf_alloc(size, NULL, &rctx);
> if (!entry)
> - return;
> + return 0;
>
> entry->ip = (unsigned long)tk->rp.kp.addr;
> memset(&entry[1], 0, dsize);
> store_trace_args(sizeof(*entry), &tk->tp, regs, (u8 *)&entry[1], dsize);
> perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> head, NULL);
> + return 0;
> }
> NOKPROBE_SYMBOL(kprobe_perf_func);
>
> @@ -1275,16 +1315,24 @@ static int kprobe_register(struct trace_event_call *event,
> static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
> {
> struct trace_kprobe *tk = container_of(kp, struct trace_kprobe, rp.kp);
> + int ret = 0;
>
> raw_cpu_inc(*tk->nhit);
>
> if (tk->tp.flags & TP_FLAG_TRACE)
> kprobe_trace_func(tk, regs);
> #ifdef CONFIG_PERF_EVENTS
> - if (tk->tp.flags & TP_FLAG_PROFILE)
> - kprobe_perf_func(tk, regs);
> + if (tk->tp.flags & TP_FLAG_PROFILE) {
> + ret = kprobe_perf_func(tk, regs);
> + /*
> + * The ftrace kprobe handler leaves it up to us to re-enable
> + * preemption here before returning if we've modified the ip.
> + */
> + if (ret)
> + preempt_enable_no_resched();
> + }
> #endif
> - return 0; /* We don't tweek kernel, so just return 0 */
> + return ret;
> }
> NOKPROBE_SYMBOL(kprobe_dispatcher);
>
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index fb66e3eaa192..5e54d748c84c 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -252,6 +252,8 @@ struct symbol_cache;
> unsigned long update_symbol_cache(struct symbol_cache *sc);
> void free_symbol_cache(struct symbol_cache *sc);
> struct symbol_cache *alloc_symbol_cache(const char *sym, long offset);
> +int trace_kprobe_ftrace(struct trace_event_call *call);
> +int trace_kprobe_error_injectable(struct trace_event_call *call);
> #else
> /* uprobes do not support symbol fetch methods */
> #define fetch_symbol_u8 NULL
> @@ -277,6 +279,16 @@ alloc_symbol_cache(const char *sym, long offset)
> {
> return NULL;
> }
> +
> +static inline int trace_kprobe_ftrace(struct trace_event_call *call)
> +{
> + return 0;
> +}
> +
> +static inline int trace_kprobe_error_injectable(struct trace_event_call *call)
> +{
> + return 0;
> +}
> #endif /* CONFIG_KPROBE_EVENTS */
>
> struct probe_arg {
> --
> 2.7.5
>
--
Masami Hiramatsu <[email protected]>
On 12/18/2017 10:51 AM, Masami Hiramatsu wrote:
> On Fri, 15 Dec 2017 14:12:54 -0500
> Josef Bacik <[email protected]> wrote:
>> From: Josef Bacik <[email protected]>
>>
>> Error injection is sloppy and very ad-hoc. BPF could fill this niche
>> perfectly with it's kprobe functionality. We could make sure errors are
>> only triggered in specific call chains that we care about with very
>> specific situations. Accomplish this with the bpf_override_funciton
>> helper. This will modify the probe'd callers return value to the
>> specified value and set the PC to an override function that simply
>> returns, bypassing the originally probed function. This gives us a nice
>> clean way to implement systematic error injection for all of our code
>> paths.
>
> OK, got it. I think the error_injectable function list should be defined
> in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
> the "safeness".
>
> [...]
>> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
>> index 8dc0161cec8f..1ea748d682fd 100644
>> --- a/arch/x86/kernel/kprobes/ftrace.c
>> +++ b/arch/x86/kernel/kprobes/ftrace.c
>> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
>> p->ainsn.boostable = false;
>> return 0;
>> }
>> +
>> +asmlinkage void override_func(void);
>> +asm(
>> + ".type override_func, @function\n"
>> + "override_func:\n"
>> + " ret\n"
>> + ".size override_func, .-override_func\n"
>> +);
>> +
>> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
>> +{
>> + regs->ip = (unsigned long)&override_func;
>> +}
>> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
>
> Calling this as "override_function" is meaningless. This is a function
> which just return. So I think combination of just_return_func() and
> arch_bpf_override_func_just_return() will be better.
>
> Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
> dependent implementation of kprobes, not bpf.
Josef, please work out any necessary cleanups that would still need
to be addressed based on Masami's feedback and send them as follow-up
patches, thanks.
> Hmm, arch/x86/net/bpf_jit_comp.c will be better place?
(No, it's JIT only and I'd really prefer to keep it that way, mixing
this would result in a huge mess.)
On Fri, 15 Dec 2017 14:12:52 -0500
Josef Bacik <[email protected]> wrote:
> From: Josef Bacik <[email protected]>
>
> Using BPF we can override kprob'ed functions and return arbitrary
> values. Obviously this can be a bit unsafe, so make this feature opt-in
> for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
> order to give BPF access to that function for error injection purposes.
>
> Signed-off-by: Josef Bacik <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
> ---
> include/asm-generic/vmlinux.lds.h | 10 +++
> include/linux/bpf.h | 11 +++
> include/linux/kprobes.h | 1 +
> include/linux/module.h | 5 ++
> kernel/kprobes.c | 163 ++++++++++++++++++++++++++++++++++++++
> kernel/module.c | 6 +-
> 6 files changed, 195 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index ee8b707d9fa9..a2e8582d094a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -136,6 +136,15 @@
> #define KPROBE_BLACKLIST()
> #endif
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define ERROR_INJECT_LIST() . = ALIGN(8); \
> + VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .; \
> + KEEP(*(_kprobe_error_inject_list)) \
> + VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) = .;
> +#else
> +#define ERROR_INJECT_LIST()
> +#endif
> +
> #ifdef CONFIG_EVENT_TRACING
> #define FTRACE_EVENTS() . = ALIGN(8); \
> VMLINUX_SYMBOL(__start_ftrace_events) = .; \
> @@ -564,6 +573,7 @@
> FTRACE_EVENTS() \
> TRACE_SYSCALLS() \
> KPROBE_BLACKLIST() \
> + ERROR_INJECT_LIST() \
> MEM_DISCARD(init.rodata) \
> CLK_OF_TABLES() \
> RESERVEDMEM_OF_TABLES() \
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e55e4255a210..7f4d2a953173 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -576,4 +576,15 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
> void bpf_user_rnd_init_once(void);
> u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>
> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
Since this feature override a function to just return with
some return value (as far as I understand, or would you
also plan to modify execution path inside a function?),
I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
CONFIG_BPF_EXECUTION_OVERRIDE.
Indeed, BPF is based on kprobes, but it seems you are limiting it
with ftrace (function-call trace) (I'm not sure the reason why),
so using "kprobes" for this feature seems strange for me.
The idea in this patch itself (marking injectable function on
a list) is OK to me.
Thank you,
> +#define BPF_ALLOW_ERROR_INJECTION(fname) \
> +static unsigned long __used \
> + __attribute__((__section__("_kprobe_error_inject_list"))) \
> + _eil_addr_##fname = (unsigned long)fname;
> +#else
> +#define BPF_ALLOW_ERROR_INJECTION(fname)
> +#endif
> +#endif
> +
> #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9440a2fc8893..963fd364f3d6 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -271,6 +271,7 @@ extern bool arch_kprobe_on_func_entry(unsigned long offset);
> extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
>
> extern bool within_kprobe_blacklist(unsigned long addr);
> +extern bool within_kprobe_error_injection_list(unsigned long addr);
>
> struct kprobe_insn_cache {
> struct mutex mutex;
> diff --git a/include/linux/module.h b/include/linux/module.h
> index c69b49abe877..548fa09fa806 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -475,6 +475,11 @@ struct module {
> ctor_fn_t *ctors;
> unsigned int num_ctors;
> #endif
> +
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + unsigned int num_kprobe_ei_funcs;
> + unsigned long *kprobe_ei_funcs;
> +#endif
> } ____cacheline_aligned __randomize_layout;
> #ifndef MODULE_ARCH_INIT
> #define MODULE_ARCH_INIT {}
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index da2ccf142358..b4aab48ad258 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -83,6 +83,16 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
> return &(kretprobe_table_locks[hash].lock);
> }
>
> +/* List of symbols that can be overriden for error injection. */
> +static LIST_HEAD(kprobe_error_injection_list);
> +static DEFINE_MUTEX(kprobe_ei_mutex);
> +struct kprobe_ei_entry {
> + struct list_head list;
> + unsigned long start_addr;
> + unsigned long end_addr;
> + void *priv;
> +};
> +
> /* Blacklist -- list of struct kprobe_blacklist_entry */
> static LIST_HEAD(kprobe_blacklist);
>
> @@ -1394,6 +1404,17 @@ bool within_kprobe_blacklist(unsigned long addr)
> return false;
> }
>
> +bool within_kprobe_error_injection_list(unsigned long addr)
> +{
> + struct kprobe_ei_entry *ent;
> +
> + list_for_each_entry(ent, &kprobe_error_injection_list, list) {
> + if (addr >= ent->start_addr && addr < ent->end_addr)
> + return true;
> + }
> + return false;
> +}
> +
> /*
> * If we have a symbol_name argument, look it up and add the offset field
> * to it. This way, we can specify a relative address to a symbol.
> @@ -2168,6 +2189,86 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> return 0;
> }
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +/* Markers of the _kprobe_error_inject_list section */
> +extern unsigned long __start_kprobe_error_inject_list[];
> +extern unsigned long __stop_kprobe_error_inject_list[];
> +
> +/*
> + * Lookup and populate the kprobe_error_injection_list.
> + *
> + * For safety reasons we only allow certain functions to be overriden with
> + * bpf_error_injection, so we need to populate the list of the symbols that have
> + * been marked as safe for overriding.
> + */
> +static void populate_kprobe_error_injection_list(unsigned long *start,
> + unsigned long *end,
> + void *priv)
> +{
> + unsigned long *iter;
> + struct kprobe_ei_entry *ent;
> + unsigned long entry, offset = 0, size = 0;
> +
> + mutex_lock(&kprobe_ei_mutex);
> + for (iter = start; iter < end; iter++) {
> + entry = arch_deref_entry_point((void *)*iter);
> +
> + if (!kernel_text_address(entry) ||
> + !kallsyms_lookup_size_offset(entry, &size, &offset)) {
> + pr_err("Failed to find error inject entry at %p\n",
> + (void *)entry);
> + continue;
> + }
> +
> + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> + if (!ent)
> + break;
> + ent->start_addr = entry;
> + ent->end_addr = entry + size;
> + ent->priv = priv;
> + INIT_LIST_HEAD(&ent->list);
> + list_add_tail(&ent->list, &kprobe_error_injection_list);
> + }
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +
> +static void __init populate_kernel_kprobe_ei_list(void)
> +{
> + populate_kprobe_error_injection_list(__start_kprobe_error_inject_list,
> + __stop_kprobe_error_inject_list,
> + NULL);
> +}
> +
> +static void module_load_kprobe_ei_list(struct module *mod)
> +{
> + if (!mod->num_kprobe_ei_funcs)
> + return;
> + populate_kprobe_error_injection_list(mod->kprobe_ei_funcs,
> + mod->kprobe_ei_funcs +
> + mod->num_kprobe_ei_funcs, mod);
> +}
> +
> +static void module_unload_kprobe_ei_list(struct module *mod)
> +{
> + struct kprobe_ei_entry *ent, *n;
> + if (!mod->num_kprobe_ei_funcs)
> + return;
> +
> + mutex_lock(&kprobe_ei_mutex);
> + list_for_each_entry_safe(ent, n, &kprobe_error_injection_list, list) {
> + if (ent->priv == mod) {
> + list_del_init(&ent->list);
> + kfree(ent);
> + }
> + }
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +#else
> +static inline void __init populate_kernel_kprobe_ei_list(void) {}
> +static inline void module_load_kprobe_ei_list(struct module *m) {}
> +static inline void module_unload_kprobe_ei_list(struct module *m) {}
> +#endif
> +
> /* Module notifier call back, checking kprobes on the module */
> static int kprobes_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -2178,6 +2279,11 @@ static int kprobes_module_callback(struct notifier_block *nb,
> unsigned int i;
> int checkcore = (val == MODULE_STATE_GOING);
>
> + if (val == MODULE_STATE_COMING)
> + module_load_kprobe_ei_list(mod);
> + else if (val == MODULE_STATE_GOING)
> + module_unload_kprobe_ei_list(mod);
> +
> if (val != MODULE_STATE_GOING && val != MODULE_STATE_LIVE)
> return NOTIFY_DONE;
>
> @@ -2240,6 +2346,8 @@ static int __init init_kprobes(void)
> pr_err("Please take care of using kprobes.\n");
> }
>
> + populate_kernel_kprobe_ei_list();
> +
> if (kretprobe_blacklist_size) {
> /* lookup the function address from its name */
> for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
> @@ -2407,6 +2515,56 @@ static const struct file_operations debugfs_kprobe_blacklist_ops = {
> .release = seq_release,
> };
>
> +/*
> + * kprobes/error_injection_list -- shows which functions can be overriden for
> + * error injection.
> + * */
> +static void *kprobe_ei_seq_start(struct seq_file *m, loff_t *pos)
> +{
> + mutex_lock(&kprobe_ei_mutex);
> + return seq_list_start(&kprobe_error_injection_list, *pos);
> +}
> +
> +static void kprobe_ei_seq_stop(struct seq_file *m, void *v)
> +{
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +
> +static void *kprobe_ei_seq_next(struct seq_file *m, void *v, loff_t *pos)
> +{
> + return seq_list_next(v, &kprobe_error_injection_list, pos);
> +}
> +
> +static int kprobe_ei_seq_show(struct seq_file *m, void *v)
> +{
> + char buffer[KSYM_SYMBOL_LEN];
> + struct kprobe_ei_entry *ent =
> + list_entry(v, struct kprobe_ei_entry, list);
> +
> + sprint_symbol(buffer, ent->start_addr);
> + seq_printf(m, "%s\n", buffer);
> + return 0;
> +}
> +
> +static const struct seq_operations kprobe_ei_seq_ops = {
> + .start = kprobe_ei_seq_start,
> + .next = kprobe_ei_seq_next,
> + .stop = kprobe_ei_seq_stop,
> + .show = kprobe_ei_seq_show,
> +};
> +
> +static int kprobe_ei_open(struct inode *inode, struct file *filp)
> +{
> + return seq_open(filp, &kprobe_ei_seq_ops);
> +}
> +
> +static const struct file_operations debugfs_kprobe_ei_ops = {
> + .open = kprobe_ei_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release,
> +};
> +
> static void arm_all_kprobes(void)
> {
> struct hlist_head *head;
> @@ -2548,6 +2706,11 @@ static int __init debugfs_kprobe_init(void)
> if (!file)
> goto error;
>
> + file = debugfs_create_file("error_injection_list", 0444, dir, NULL,
> + &debugfs_kprobe_ei_ops);
> + if (!file)
> + goto error;
> +
> return 0;
>
> error:
> diff --git a/kernel/module.c b/kernel/module.c
> index dea01ac9cb74..bd695bfdc5c4 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3118,7 +3118,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> sizeof(*mod->ftrace_callsites),
> &mod->num_ftrace_callsites);
> #endif
> -
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + mod->kprobe_ei_funcs = section_objs(info, "_kprobe_error_inject_list",
> + sizeof(*mod->kprobe_ei_funcs),
> + &mod->num_kprobe_ei_funcs);
> +#endif
> mod->extable = section_objs(info, "__ex_table",
> sizeof(*mod->extable), &mod->num_exentries);
>
> --
> 2.7.5
>
--
Masami Hiramatsu <[email protected]>
On Mon, 18 Dec 2017 16:09:30 +0100
Daniel Borkmann <[email protected]> wrote:
> On 12/18/2017 10:51 AM, Masami Hiramatsu wrote:
> > On Fri, 15 Dec 2017 14:12:54 -0500
> > Josef Bacik <[email protected]> wrote:
> >> From: Josef Bacik <[email protected]>
> >>
> >> Error injection is sloppy and very ad-hoc. BPF could fill this niche
> >> perfectly with it's kprobe functionality. We could make sure errors are
> >> only triggered in specific call chains that we care about with very
> >> specific situations. Accomplish this with the bpf_override_funciton
> >> helper. This will modify the probe'd callers return value to the
> >> specified value and set the PC to an override function that simply
> >> returns, bypassing the originally probed function. This gives us a nice
> >> clean way to implement systematic error injection for all of our code
> >> paths.
> >
> > OK, got it. I think the error_injectable function list should be defined
> > in kernel/trace/bpf_trace.c because only bpf calls it and needs to care
> > the "safeness".
> >
> > [...]
> >> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> >> index 8dc0161cec8f..1ea748d682fd 100644
> >> --- a/arch/x86/kernel/kprobes/ftrace.c
> >> +++ b/arch/x86/kernel/kprobes/ftrace.c
> >> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
> >> p->ainsn.boostable = false;
> >> return 0;
> >> }
> >> +
> >> +asmlinkage void override_func(void);
> >> +asm(
> >> + ".type override_func, @function\n"
> >> + "override_func:\n"
> >> + " ret\n"
> >> + ".size override_func, .-override_func\n"
> >> +);
> >> +
> >> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
> >> +{
> >> + regs->ip = (unsigned long)&override_func;
> >> +}
> >> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
> >
> > Calling this as "override_function" is meaningless. This is a function
> > which just return. So I think combination of just_return_func() and
> > arch_bpf_override_func_just_return() will be better.
> >
> > Moreover, this arch/x86/kernel/kprobes/ftrace.c is an archtecture
> > dependent implementation of kprobes, not bpf.
>
> Josef, please work out any necessary cleanups that would still need
> to be addressed based on Masami's feedback and send them as follow-up
> patches, thanks.
>
> > Hmm, arch/x86/net/bpf_jit_comp.c will be better place?
>
> (No, it's JIT only and I'd really prefer to keep it that way, mixing
> this would result in a huge mess.)
OK, that is same to kprobes. kernel/kprobes.c and arch/x86/kernel/kprobe/*
are for instrumentation code. And kernel/trace/trace_kprobe.c is ftrace's
kprobe user interface, just one implementation of kprobe usage. So please
do not mix it up. It will result in a huge mess to me.
Thank you,
--
Masami Hiramatsu <[email protected]>
On 12/18/17 10:29 PM, Masami Hiramatsu wrote:
>>
>> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
>> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
>
> BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
> Since this feature override a function to just return with
> some return value (as far as I understand, or would you
> also plan to modify execution path inside a function?),
> I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
> CONFIG_BPF_EXECUTION_OVERRIDE.
I don't think such renaming makes sense.
The feature is overriding kprobe by changing how kprobe returns.
It doesn't override BPF_FUNCTION or BPF_EXECUTION.
The kernel enters and exists bpf program as normal.
> Indeed, BPF is based on kprobes, but it seems you are limiting it
> with ftrace (function-call trace) (I'm not sure the reason why),
> so using "kprobes" for this feature seems strange for me.
do you have an idea how kprobe override can happen when kprobe
placed in the middle of the function?
Please make your suggestion as patches based on top of bpf-next.
Thanks
On Tue, 19 Dec 2017 18:14:17 -0800
Alexei Starovoitov <[email protected]> wrote:
> On 12/18/17 10:29 PM, Masami Hiramatsu wrote:
> >>
> >> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
> >> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> >
> > BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
> > Since this feature override a function to just return with
> > some return value (as far as I understand, or would you
> > also plan to modify execution path inside a function?),
> > I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
> > CONFIG_BPF_EXECUTION_OVERRIDE.
>
> I don't think such renaming makes sense.
> The feature is overriding kprobe by changing how kprobe returns.
> It doesn't override BPF_FUNCTION or BPF_EXECUTION.
No, I meant this is BPF's feature which override FUNCTION, so
BPF is a kind of namespace. (Is that only for a function entry
because it can not tweak stackframe at this morment?)
> The kernel enters and exists bpf program as normal.
Yeah, but that bpf program modifies instruction pointer, am I correct?
>
> > Indeed, BPF is based on kprobes, but it seems you are limiting it
> > with ftrace (function-call trace) (I'm not sure the reason why),
> > so using "kprobes" for this feature seems strange for me.
>
> do you have an idea how kprobe override can happen when kprobe
> placed in the middle of the function?
For example, if you know a basic block in the function, maybe
you can skip a block or something like that. But nowadays
it is somewhat hard because optimizer mixed it up.
>
> Please make your suggestion as patches based on top of bpf-next.
bpf-next seems already pick this series. Would you mean I revert it and
write new patch?
Thank you,
>
> Thanks
>
--
Masami Hiramatsu <[email protected]>
On 12/20/2017 08:13 AM, Masami Hiramatsu wrote:
> On Tue, 19 Dec 2017 18:14:17 -0800
> Alexei Starovoitov <[email protected]> wrote:
[...]
>> Please make your suggestion as patches based on top of bpf-next.
>
> bpf-next seems already pick this series. Would you mean I revert it and
> write new patch?
No, please submit as follow-ups instead, thanks Masami!
On Fri, 15 Dec 2017 14:12:52 -0500
Josef Bacik <[email protected]> wrote:
> From: Josef Bacik <[email protected]>
>
> Using BPF we can override kprob'ed functions and return arbitrary
> values. Obviously this can be a bit unsafe, so make this feature opt-in
> for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
> order to give BPF access to that function for error injection purposes.
>
> Signed-off-by: Josef Bacik <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
> ---
> include/asm-generic/vmlinux.lds.h | 10 +++
> include/linux/bpf.h | 11 +++
> include/linux/kprobes.h | 1 +
> include/linux/module.h | 5 ++
> kernel/kprobes.c | 163 ++++++++++++++++++++++++++++++++++++++
> kernel/module.c | 6 +-
> 6 files changed, 195 insertions(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index ee8b707d9fa9..a2e8582d094a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -136,6 +136,15 @@
> #define KPROBE_BLACKLIST()
> #endif
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define ERROR_INJECT_LIST() . = ALIGN(8); \
> + VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .; \
> + KEEP(*(_kprobe_error_inject_list)) \
> + VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) = .;
> +#else
> +#define ERROR_INJECT_LIST()
> +#endif
> +
> #ifdef CONFIG_EVENT_TRACING
> #define FTRACE_EVENTS() . = ALIGN(8); \
> VMLINUX_SYMBOL(__start_ftrace_events) = .; \
> @@ -564,6 +573,7 @@
> FTRACE_EVENTS() \
> TRACE_SYSCALLS() \
> KPROBE_BLACKLIST() \
> + ERROR_INJECT_LIST() \
> MEM_DISCARD(init.rodata) \
> CLK_OF_TABLES() \
> RESERVEDMEM_OF_TABLES() \
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e55e4255a210..7f4d2a953173 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -576,4 +576,15 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
> void bpf_user_rnd_init_once(void);
> u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>
> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +#define BPF_ALLOW_ERROR_INJECTION(fname) \
> +static unsigned long __used \
> + __attribute__((__section__("_kprobe_error_inject_list"))) \
> + _eil_addr_##fname = (unsigned long)fname;
> +#else
> +#define BPF_ALLOW_ERROR_INJECTION(fname)
> +#endif
> +#endif
This part shows this feature belongs to bpf, if it is a part of kprobes,
it should be defined in include/asm-generic/kprobes.h as NOKPROBE_SYMBOL
does.
Why this is defined in BPF, but list is under kprobes?
> +
> #endif /* _LINUX_BPF_H */
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 9440a2fc8893..963fd364f3d6 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -271,6 +271,7 @@ extern bool arch_kprobe_on_func_entry(unsigned long offset);
> extern bool kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset);
>
> extern bool within_kprobe_blacklist(unsigned long addr);
> +extern bool within_kprobe_error_injection_list(unsigned long addr);
>
> struct kprobe_insn_cache {
> struct mutex mutex;
> diff --git a/include/linux/module.h b/include/linux/module.h
> index c69b49abe877..548fa09fa806 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -475,6 +475,11 @@ struct module {
> ctor_fn_t *ctors;
> unsigned int num_ctors;
> #endif
> +
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + unsigned int num_kprobe_ei_funcs;
> + unsigned long *kprobe_ei_funcs;
> +#endif
> } ____cacheline_aligned __randomize_layout;
> #ifndef MODULE_ARCH_INIT
> #define MODULE_ARCH_INIT {}
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index da2ccf142358..b4aab48ad258 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -83,6 +83,16 @@ static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
> return &(kretprobe_table_locks[hash].lock);
> }
>
> +/* List of symbols that can be overriden for error injection. */
> +static LIST_HEAD(kprobe_error_injection_list);
> +static DEFINE_MUTEX(kprobe_ei_mutex);
> +struct kprobe_ei_entry {
> + struct list_head list;
> + unsigned long start_addr;
> + unsigned long end_addr;
> + void *priv;
> +};
Again, no kprobe user except for bpf, which is actually trace_kprobe user,
only refer this.
I mean
"bpf uses trace_kprobe, trace_kprobe uses kprobe."
So there is no direct relationship with kprobe.
For example, kprobe user modules can OVERRIDE any functions.
And there is no generic error injection code in the kernel
except for the bpf currently.
Of course, I can accept this code if you accept that I make a
generic error injection code on ftrace without BPF.
Ingo, that is what you intended?
Thank you,
> +
> /* Blacklist -- list of struct kprobe_blacklist_entry */
> static LIST_HEAD(kprobe_blacklist);
>
> @@ -1394,6 +1404,17 @@ bool within_kprobe_blacklist(unsigned long addr)
> return false;
> }
>
> +bool within_kprobe_error_injection_list(unsigned long addr)
> +{
> + struct kprobe_ei_entry *ent;
> +
> + list_for_each_entry(ent, &kprobe_error_injection_list, list) {
> + if (addr >= ent->start_addr && addr < ent->end_addr)
> + return true;
> + }
> + return false;
> +}
> +
> /*
> * If we have a symbol_name argument, look it up and add the offset field
> * to it. This way, we can specify a relative address to a symbol.
> @@ -2168,6 +2189,86 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
> return 0;
> }
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +/* Markers of the _kprobe_error_inject_list section */
> +extern unsigned long __start_kprobe_error_inject_list[];
> +extern unsigned long __stop_kprobe_error_inject_list[];
> +
> +/*
> + * Lookup and populate the kprobe_error_injection_list.
> + *
> + * For safety reasons we only allow certain functions to be overriden with
> + * bpf_error_injection, so we need to populate the list of the symbols that have
> + * been marked as safe for overriding.
> + */
> +static void populate_kprobe_error_injection_list(unsigned long *start,
> + unsigned long *end,
> + void *priv)
> +{
> + unsigned long *iter;
> + struct kprobe_ei_entry *ent;
> + unsigned long entry, offset = 0, size = 0;
> +
> + mutex_lock(&kprobe_ei_mutex);
> + for (iter = start; iter < end; iter++) {
> + entry = arch_deref_entry_point((void *)*iter);
> +
> + if (!kernel_text_address(entry) ||
> + !kallsyms_lookup_size_offset(entry, &size, &offset)) {
> + pr_err("Failed to find error inject entry at %p\n",
> + (void *)entry);
> + continue;
> + }
> +
> + ent = kmalloc(sizeof(*ent), GFP_KERNEL);
> + if (!ent)
> + break;
> + ent->start_addr = entry;
> + ent->end_addr = entry + size;
> + ent->priv = priv;
> + INIT_LIST_HEAD(&ent->list);
> + list_add_tail(&ent->list, &kprobe_error_injection_list);
> + }
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +
> +static void __init populate_kernel_kprobe_ei_list(void)
> +{
> + populate_kprobe_error_injection_list(__start_kprobe_error_inject_list,
> + __stop_kprobe_error_inject_list,
> + NULL);
> +}
> +
> +static void module_load_kprobe_ei_list(struct module *mod)
> +{
> + if (!mod->num_kprobe_ei_funcs)
> + return;
> + populate_kprobe_error_injection_list(mod->kprobe_ei_funcs,
> + mod->kprobe_ei_funcs +
> + mod->num_kprobe_ei_funcs, mod);
> +}
> +
> +static void module_unload_kprobe_ei_list(struct module *mod)
> +{
> + struct kprobe_ei_entry *ent, *n;
> + if (!mod->num_kprobe_ei_funcs)
> + return;
> +
> + mutex_lock(&kprobe_ei_mutex);
> + list_for_each_entry_safe(ent, n, &kprobe_error_injection_list, list) {
> + if (ent->priv == mod) {
> + list_del_init(&ent->list);
> + kfree(ent);
> + }
> + }
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +#else
> +static inline void __init populate_kernel_kprobe_ei_list(void) {}
> +static inline void module_load_kprobe_ei_list(struct module *m) {}
> +static inline void module_unload_kprobe_ei_list(struct module *m) {}
> +#endif
> +
> /* Module notifier call back, checking kprobes on the module */
> static int kprobes_module_callback(struct notifier_block *nb,
> unsigned long val, void *data)
> @@ -2178,6 +2279,11 @@ static int kprobes_module_callback(struct notifier_block *nb,
> unsigned int i;
> int checkcore = (val == MODULE_STATE_GOING);
>
> + if (val == MODULE_STATE_COMING)
> + module_load_kprobe_ei_list(mod);
> + else if (val == MODULE_STATE_GOING)
> + module_unload_kprobe_ei_list(mod);
> +
> if (val != MODULE_STATE_GOING && val != MODULE_STATE_LIVE)
> return NOTIFY_DONE;
>
> @@ -2240,6 +2346,8 @@ static int __init init_kprobes(void)
> pr_err("Please take care of using kprobes.\n");
> }
>
> + populate_kernel_kprobe_ei_list();
> +
> if (kretprobe_blacklist_size) {
> /* lookup the function address from its name */
> for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
> @@ -2407,6 +2515,56 @@ static const struct file_operations debugfs_kprobe_blacklist_ops = {
> .release = seq_release,
> };
>
> +/*
> + * kprobes/error_injection_list -- shows which functions can be overriden for
> + * error injection.
> + * */
> +static void *kprobe_ei_seq_start(struct seq_file *m, loff_t *pos)
> +{
> + mutex_lock(&kprobe_ei_mutex);
> + return seq_list_start(&kprobe_error_injection_list, *pos);
> +}
> +
> +static void kprobe_ei_seq_stop(struct seq_file *m, void *v)
> +{
> + mutex_unlock(&kprobe_ei_mutex);
> +}
> +
> +static void *kprobe_ei_seq_next(struct seq_file *m, void *v, loff_t *pos)
> +{
> + return seq_list_next(v, &kprobe_error_injection_list, pos);
> +}
> +
> +static int kprobe_ei_seq_show(struct seq_file *m, void *v)
> +{
> + char buffer[KSYM_SYMBOL_LEN];
> + struct kprobe_ei_entry *ent =
> + list_entry(v, struct kprobe_ei_entry, list);
> +
> + sprint_symbol(buffer, ent->start_addr);
> + seq_printf(m, "%s\n", buffer);
> + return 0;
> +}
> +
> +static const struct seq_operations kprobe_ei_seq_ops = {
> + .start = kprobe_ei_seq_start,
> + .next = kprobe_ei_seq_next,
> + .stop = kprobe_ei_seq_stop,
> + .show = kprobe_ei_seq_show,
> +};
> +
> +static int kprobe_ei_open(struct inode *inode, struct file *filp)
> +{
> + return seq_open(filp, &kprobe_ei_seq_ops);
> +}
> +
> +static const struct file_operations debugfs_kprobe_ei_ops = {
> + .open = kprobe_ei_open,
> + .read = seq_read,
> + .llseek = seq_lseek,
> + .release = seq_release,
> +};
> +
> static void arm_all_kprobes(void)
> {
> struct hlist_head *head;
> @@ -2548,6 +2706,11 @@ static int __init debugfs_kprobe_init(void)
> if (!file)
> goto error;
>
> + file = debugfs_create_file("error_injection_list", 0444, dir, NULL,
> + &debugfs_kprobe_ei_ops);
> + if (!file)
> + goto error;
> +
> return 0;
>
> error:
> diff --git a/kernel/module.c b/kernel/module.c
> index dea01ac9cb74..bd695bfdc5c4 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3118,7 +3118,11 @@ static int find_module_sections(struct module *mod, struct load_info *info)
> sizeof(*mod->ftrace_callsites),
> &mod->num_ftrace_callsites);
> #endif
> -
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> + mod->kprobe_ei_funcs = section_objs(info, "_kprobe_error_inject_list",
> + sizeof(*mod->kprobe_ei_funcs),
> + &mod->num_kprobe_ei_funcs);
> +#endif
> mod->extable = section_objs(info, "__ex_table",
> sizeof(*mod->extable), &mod->num_exentries);
>
> --
> 2.7.5
>
--
Masami Hiramatsu <[email protected]>
On 12/19/17 11:13 PM, Masami Hiramatsu wrote:
> On Tue, 19 Dec 2017 18:14:17 -0800
> Alexei Starovoitov <[email protected]> wrote:
>
>> On 12/18/17 10:29 PM, Masami Hiramatsu wrote:
>>>>
>>>> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
>>>> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
>>>
>>> BTW, CONFIG_BPF_KPROBE_OVERRIDE is also confusable name.
>>> Since this feature override a function to just return with
>>> some return value (as far as I understand, or would you
>>> also plan to modify execution path inside a function?),
>>> I think it should be better CONFIG_BPF_FUNCTION_OVERRIDE or
>>> CONFIG_BPF_EXECUTION_OVERRIDE.
>>
>> I don't think such renaming makes sense.
>> The feature is overriding kprobe by changing how kprobe returns.
>> It doesn't override BPF_FUNCTION or BPF_EXECUTION.
>
> No, I meant this is BPF's feature which override FUNCTION, so
> BPF is a kind of namespace. (Is that only for a function entry
> because it can not tweak stackframe at this morment?)
>
>> The kernel enters and exists bpf program as normal.
>
> Yeah, but that bpf program modifies instruction pointer, am I correct?
no. bpf side is asking kprobe side to modify it.
bpf cannot do such things as modifying IP or any other register
directly.
>>
>>> Indeed, BPF is based on kprobes, but it seems you are limiting it
>>> with ftrace (function-call trace) (I'm not sure the reason why),
>>> so using "kprobes" for this feature seems strange for me.
>>
>> do you have an idea how kprobe override can happen when kprobe
>> placed in the middle of the function?
>
> For example, if you know a basic block in the function, maybe
> you can skip a block or something like that. But nowadays
> it is somewhat hard because optimizer mixed it up.
still missing how that can work...
On 12/20/17 3:00 AM, Masami Hiramatsu wrote:
> On Fri, 15 Dec 2017 14:12:52 -0500
> Josef Bacik <[email protected]> wrote:
>
>> From: Josef Bacik <[email protected]>
>>
>> Using BPF we can override kprob'ed functions and return arbitrary
>> values. Obviously this can be a bit unsafe, so make this feature opt-in
>> for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in
>> order to give BPF access to that function for error injection purposes.
>>
>> Signed-off-by: Josef Bacik <[email protected]>
>> Acked-by: Ingo Molnar <[email protected]>
>> ---
>> include/asm-generic/vmlinux.lds.h | 10 +++
>> include/linux/bpf.h | 11 +++
>> include/linux/kprobes.h | 1 +
>> include/linux/module.h | 5 ++
>> kernel/kprobes.c | 163 ++++++++++++++++++++++++++++++++++++++
>> kernel/module.c | 6 +-
>> 6 files changed, 195 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
>> index ee8b707d9fa9..a2e8582d094a 100644
>> --- a/include/asm-generic/vmlinux.lds.h
>> +++ b/include/asm-generic/vmlinux.lds.h
>> @@ -136,6 +136,15 @@
>> #define KPROBE_BLACKLIST()
>> #endif
>>
>> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
>> +#define ERROR_INJECT_LIST() . = ALIGN(8); \
>> + VMLINUX_SYMBOL(__start_kprobe_error_inject_list) = .; \
>> + KEEP(*(_kprobe_error_inject_list)) \
>> + VMLINUX_SYMBOL(__stop_kprobe_error_inject_list) = .;
>> +#else
>> +#define ERROR_INJECT_LIST()
>> +#endif
>> +
>> #ifdef CONFIG_EVENT_TRACING
>> #define FTRACE_EVENTS() . = ALIGN(8); \
>> VMLINUX_SYMBOL(__start_ftrace_events) = .; \
>> @@ -564,6 +573,7 @@
>> FTRACE_EVENTS() \
>> TRACE_SYSCALLS() \
>> KPROBE_BLACKLIST() \
>> + ERROR_INJECT_LIST() \
>> MEM_DISCARD(init.rodata) \
>> CLK_OF_TABLES() \
>> RESERVEDMEM_OF_TABLES() \
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index e55e4255a210..7f4d2a953173 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -576,4 +576,15 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto;
>> void bpf_user_rnd_init_once(void);
>> u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
>>
>> +#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
>> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
>> +#define BPF_ALLOW_ERROR_INJECTION(fname) \
>> +static unsigned long __used \
>> + __attribute__((__section__("_kprobe_error_inject_list"))) \
>> + _eil_addr_##fname = (unsigned long)fname;
>> +#else
>> +#define BPF_ALLOW_ERROR_INJECTION(fname)
>> +#endif
>> +#endif
>
> This part shows this feature belongs to bpf, if it is a part of kprobes,
> it should be defined in include/asm-generic/kprobes.h as NOKPROBE_SYMBOL
> does.
>
> Why this is defined in BPF, but list is under kprobes?
because Ingo specifically requested that macro that marks the function
will be in bpf.h, so any .c file that starts adding such marks will
include that file instead of pulling stuff from kprobe.
>
> So there is no direct relationship with kprobe.
> For example, kprobe user modules can OVERRIDE any functions.
> And there is no generic error injection code in the kernel
> except for the bpf currently.
_currently_ is key word.
> Of course, I can accept this code if you accept that I make a
> generic error injection code on ftrace without BPF.
what stops other pieces of kernel to use the same technique?
The bpf verifier coupled together with opt-in
per-function marks via BPF_ALLOW_ERROR_INJECTION
give _safe_ way to do error injection.
I can imagine how you can hack kprobe text based interface to
use the same technique, but I suggest to wait and see how we
build on it in bpf land before replicating things in
pure kprobe land.