Hi,
Here are the 2nd version for generic kretprobe and kretprobe on x86 for
replacing the kretprobe trampoline with rethook. The previous version
is here[1]
[1] https://lore.kernel.org/all/164818251899.2252200.7306353689206167903.stgit@devnote2/T/#u
In this version I added completing pt_regs by saving regs->ss register
for rethook (from Peter) and optprobe.
Background:
This rethook came from Jiri's request of multiple kprobe for bpf[1].
He tried to solve an issue that starting bpf with multiple kprobe will
take a long time because bpf-kprobe will wait for RCU grace period for
sync rcu events.
Jiri wanted to attach a single bpf handler to multiple kprobes and
he tried to introduce multiple-probe interface to kprobe. So I asked
him to use ftrace and kretprobe-like hook if it is only for the
function entry and exit, instead of adding ad-hoc interface
to kprobes.
For this purpose, I introduced the fprobe (kprobe like interface for
ftrace) with the rethook (this is a generic return hook feature for
fprobe exit handler)[2].
[1] https://lore.kernel.org/all/[email protected]/T/#u
[2] https://lore.kernel.org/all/164191321766.806991.7930388561276940676.stgit@devnote2/T/#u
The rethook is basically same as the kretprobe trampoline. I just made
it decoupled from kprobes. Eventually, the all arch dependent kretprobe
trampolines will be replaced with the rethook trampoline instead of
cloning and set HAVE_RETHOOK=y.
When I port the rethook for all arch which supports kretprobe, the
legacy kretprobe specific code (which is for CONFIG_KRETPROBE_ON_RETHOOK=n)
will be removed eventually.
Worktree notice:
BTW, this patch can be applied to next-20220324, not the bpf-next tree
directly, because this depends on ANNOTATE_NOENDBR macro. However, since
the fprobe is merged in the bpf-next, I marked this for bpf-next.
So until merging the both of fprobes and ENDBR series, to compile this
you need below 2 lines in arch/x86/kernel/rethook.c.
#ifndef ANNOTATE_NOENDBR
#define ANNOTATE_NOENDBR
But after those are merged, these lines will be unneeded. How should I
handle this issue? (Just remove ANNOTATE_NOENDBR line in bpf-next?)
Thank you,
---
Masami Hiramatsu (3):
kprobes: Use rethook for kretprobe if possible
rethook: kprobes: x86: Replace kretprobe with rethook on x86
x86,kprobes: Fix optprobe trampoline to generate complete pt_regs
Peter Zijlstra (1):
Subject: x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs
arch/Kconfig | 7 ++
arch/x86/Kconfig | 1
arch/x86/include/asm/unwind.h | 23 +++----
arch/x86/kernel/Makefile | 1
arch/x86/kernel/kprobes/common.h | 1
arch/x86/kernel/kprobes/core.c | 107 ---------------------------------
arch/x86/kernel/kprobes/opt.c | 25 +++++---
arch/x86/kernel/rethook.c | 123 ++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/unwind_orc.c | 10 ++-
include/linux/kprobes.h | 51 +++++++++++++++-
kernel/kprobes.c | 124 ++++++++++++++++++++++++++++++++------
kernel/trace/trace_kprobe.c | 4 +
12 files changed, 319 insertions(+), 158 deletions(-)
create mode 100644 arch/x86/kernel/rethook.c
--
Masami Hiramatsu (Linaro) <[email protected]>
On Fri, Mar 25, 2022 at 7:43 AM Peter Zijlstra <[email protected]> wrote:
>
> On Fri, Mar 25, 2022 at 11:22:53PM +0900, Masami Hiramatsu wrote:
>
> > Masami Hiramatsu (3):
> > kprobes: Use rethook for kretprobe if possible
> > rethook: kprobes: x86: Replace kretprobe with rethook on x86
> > x86,kprobes: Fix optprobe trampoline to generate complete pt_regs
> >
> > Peter Zijlstra (1):
> > Subject: x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs
>
> You fat-fingered the subject there ^
>
> Other than that:
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
>
> Hopefully the ftrace return trampoline can also be switched over..
Thanks Peter. What's an ETA on landing endbr set?
Did I miss a pull req?
I see an odd error in linux-next with bpf selftests
which may or may not be related. Planning to debug it
when everything settles in Linus's tree.
Masami, could you do another respin?
Also do you mind squashing patches 2,3,4 ?
It's odd to have the same lines of code patched up 3 times.
Just do it right once.
Currently the optprobe trampoline template code ganerate an
almost complete pt_regs on-stack, everything except regs->ss.
The 'regs->ss' points to the top of stack, which is not a
valid segment decriptor.
As same as the rethook does, complete the job by also pushing ss.
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/kernel/kprobes/opt.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index b4a54a52aa59..e6b8c5362b94 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -106,7 +106,8 @@ asm (
".global optprobe_template_entry\n"
"optprobe_template_entry:\n"
#ifdef CONFIG_X86_64
- /* We don't bother saving the ss register */
+ " pushq $" __stringify(__KERNEL_DS) "\n"
+ /* Save the 'sp - 8', this will be fixed later. */
" pushq %rsp\n"
" pushfq\n"
".global optprobe_template_clac\n"
@@ -121,14 +122,17 @@ asm (
".global optprobe_template_call\n"
"optprobe_template_call:\n"
ASM_NOP5
- /* Move flags to rsp */
+ /* Copy 'regs->flags' into 'regs->ss'. */
" movq 18*8(%rsp), %rdx\n"
- " movq %rdx, 19*8(%rsp)\n"
+ " movq %rdx, 20*8(%rsp)\n"
RESTORE_REGS_STRING
- /* Skip flags entry */
- " addq $8, %rsp\n"
+ /* Skip 'regs->flags' and 'regs->sp'. */
+ " addq $16, %rsp\n"
+ /* And pop flags register from 'regs->ss'. */
" popfq\n"
#else /* CONFIG_X86_32 */
+ " pushl %ss\n"
+ /* Save the 'sp - 4', this will be fixed later. */
" pushl %esp\n"
" pushfl\n"
".global optprobe_template_clac\n"
@@ -142,12 +146,13 @@ asm (
".global optprobe_template_call\n"
"optprobe_template_call:\n"
ASM_NOP5
- /* Move flags into esp */
+ /* Copy 'regs->flags' into 'regs->ss'. */
" movl 14*4(%esp), %edx\n"
- " movl %edx, 15*4(%esp)\n"
+ " movl %edx, 16*4(%esp)\n"
RESTORE_REGS_STRING
- /* Skip flags entry */
- " addl $4, %esp\n"
+ /* Skip 'regs->flags' and 'regs->sp'. */
+ " addl $8, %esp\n"
+ /* And pop flags register from 'regs->ss'. */
" popfl\n"
#endif
".global optprobe_template_end\n"
@@ -179,6 +184,8 @@ optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs)
kprobes_inc_nmissed_count(&op->kp);
} else {
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ /* Adjust stack pointer */
+ regs->sp += sizeof(long);
/* Save skipped registers */
regs->cs = __KERNEL_CS;
#ifdef CONFIG_X86_32
Replaces the kretprobe code with rethook on x86. With this patch,
kretprobe on x86 uses the rethook instead of kretprobe specific
trampoline code.
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Jiri Olsa <[email protected]>
---
Changes in v2:
- Fix remaining unwind_recover_kretprobe() with unwind_recover_rethook()
(Thanks Peter!)
- Replace remaining "kretprobe" with "rethook" in comment.
---
arch/x86/Kconfig | 1
arch/x86/include/asm/unwind.h | 23 +++----
arch/x86/kernel/Makefile | 1
arch/x86/kernel/kprobes/common.h | 1
arch/x86/kernel/kprobes/core.c | 107 ----------------------------------
arch/x86/kernel/rethook.c | 121 ++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/unwind_orc.c | 10 ++-
7 files changed, 140 insertions(+), 124 deletions(-)
create mode 100644 arch/x86/kernel/rethook.c
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index db3bedcbf6be..e85b52a6f612 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -224,6 +224,7 @@ config X86
select HAVE_KPROBES_ON_FTRACE
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_KRETPROBES
+ select HAVE_RETHOOK
select HAVE_KVM
select HAVE_LIVEPATCH if X86_64
select HAVE_MIXED_BREAKPOINTS_REGS
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 2a1f8734416d..7cede4dc21f0 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -4,7 +4,7 @@
#include <linux/sched.h>
#include <linux/ftrace.h>
-#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <asm/ptrace.h>
#include <asm/stacktrace.h>
@@ -16,7 +16,7 @@ struct unwind_state {
unsigned long stack_mask;
struct task_struct *task;
int graph_idx;
-#ifdef CONFIG_KRETPROBES
+#if defined(CONFIG_RETHOOK)
struct llist_node *kr_cur;
#endif
bool error;
@@ -104,19 +104,18 @@ void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size,
#endif
static inline
-unsigned long unwind_recover_kretprobe(struct unwind_state *state,
- unsigned long addr, unsigned long *addr_p)
+unsigned long unwind_recover_rethook(struct unwind_state *state,
+ unsigned long addr, unsigned long *addr_p)
{
-#ifdef CONFIG_KRETPROBES
- return is_kretprobe_trampoline(addr) ?
- kretprobe_find_ret_addr(state->task, addr_p, &state->kr_cur) :
- addr;
-#else
- return addr;
+#ifdef CONFIG_RETHOOK
+ if (is_rethook_trampoline(addr))
+ return rethook_find_ret_addr(state->task, (unsigned long)addr_p,
+ &state->kr_cur);
#endif
+ return addr;
}
-/* Recover the return address modified by kretprobe and ftrace_graph. */
+/* Recover the return address modified by rethook and ftrace_graph. */
static inline
unsigned long unwind_recover_ret_addr(struct unwind_state *state,
unsigned long addr, unsigned long *addr_p)
@@ -125,7 +124,7 @@ unsigned long unwind_recover_ret_addr(struct unwind_state *state,
ret = ftrace_graph_ret_addr(state->task, &state->graph_idx,
addr, addr_p);
- return unwind_recover_kretprobe(state, ret, addr_p);
+ return unwind_recover_rethook(state, ret, addr_p);
}
/*
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 6462e3dd98f4..c41ef42adbe8 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -103,6 +103,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o
obj-$(CONFIG_X86_TSC) += trace_clock.o
obj-$(CONFIG_TRACING) += trace.o
+obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/common.h
index 7d3a2e2daf01..c993521d4933 100644
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -6,6 +6,7 @@
#include <asm/asm.h>
#include <asm/frame.h>
+#include <asm/insn.h>
#ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 8ef933c03afa..7c4ab8870da4 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -811,18 +811,6 @@ set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
= (regs->flags & X86_EFLAGS_IF);
}
-void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
-{
- unsigned long *sara = stack_addr(regs);
-
- ri->ret_addr = (kprobe_opcode_t *) *sara;
- ri->fp = sara;
-
- /* Replace the return addr with trampoline addr */
- *sara = (unsigned long) &__kretprobe_trampoline;
-}
-NOKPROBE_SYMBOL(arch_prepare_kretprobe);
-
static void kprobe_post_process(struct kprobe *cur, struct pt_regs *regs,
struct kprobe_ctlblk *kcb)
{
@@ -1023,101 +1011,6 @@ int kprobe_int3_handler(struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kprobe_int3_handler);
-/*
- * When a retprobed function returns, this code saves registers and
- * calls trampoline_handler() runs, which calls the kretprobe's handler.
- */
-asm(
- ".text\n"
- ".global __kretprobe_trampoline\n"
- ".type __kretprobe_trampoline, @function\n"
- "__kretprobe_trampoline:\n"
-#ifdef CONFIG_X86_64
- ANNOTATE_NOENDBR
- /* Push a fake return address to tell the unwinder it's a kretprobe. */
- " pushq $__kretprobe_trampoline\n"
- UNWIND_HINT_FUNC
- /* Save the 'sp - 8', this will be fixed later. */
- " pushq %rsp\n"
- " pushfq\n"
- SAVE_REGS_STRING
- " movq %rsp, %rdi\n"
- " call trampoline_handler\n"
- RESTORE_REGS_STRING
- /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */
- " addq $8, %rsp\n"
- " popfq\n"
-#else
- /* Push a fake return address to tell the unwinder it's a kretprobe. */
- " pushl $__kretprobe_trampoline\n"
- UNWIND_HINT_FUNC
- /* Save the 'sp - 4', this will be fixed later. */
- " pushl %esp\n"
- " pushfl\n"
- SAVE_REGS_STRING
- " movl %esp, %eax\n"
- " call trampoline_handler\n"
- RESTORE_REGS_STRING
- /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */
- " addl $4, %esp\n"
- " popfl\n"
-#endif
- ASM_RET
- ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n"
-);
-NOKPROBE_SYMBOL(__kretprobe_trampoline);
-/*
- * __kretprobe_trampoline() skips updating frame pointer. The frame pointer
- * saved in trampoline_handler() points to the real caller function's
- * frame pointer. Thus the __kretprobe_trampoline() doesn't have a
- * standard stack frame with CONFIG_FRAME_POINTER=y.
- * Let's mark it non-standard function. Anyway, FP unwinder can correctly
- * unwind without the hint.
- */
-STACK_FRAME_NON_STANDARD_FP(__kretprobe_trampoline);
-
-/* This is called from kretprobe_trampoline_handler(). */
-void arch_kretprobe_fixup_return(struct pt_regs *regs,
- kprobe_opcode_t *correct_ret_addr)
-{
- unsigned long *frame_pointer = ®s->sp + 1;
-
- /* Replace fake return address with real one. */
- *frame_pointer = (unsigned long)correct_ret_addr;
-}
-
-/*
- * Called from __kretprobe_trampoline
- */
-__used __visible void trampoline_handler(struct pt_regs *regs)
-{
- unsigned long *frame_pointer;
-
- /* fixup registers */
- regs->cs = __KERNEL_CS;
-#ifdef CONFIG_X86_32
- regs->gs = 0;
-#endif
- regs->ip = (unsigned long)&__kretprobe_trampoline;
- regs->orig_ax = ~0UL;
- regs->sp += sizeof(long);
- frame_pointer = ®s->sp + 1;
-
- /*
- * The return address at 'frame_pointer' is recovered by the
- * arch_kretprobe_fixup_return() which called from the
- * kretprobe_trampoline_handler().
- */
- kretprobe_trampoline_handler(regs, frame_pointer);
-
- /*
- * Copy FLAGS to 'pt_regs::sp' so that __kretprobe_trapmoline()
- * can do RET right after POPF.
- */
- regs->sp = regs->flags;
-}
-NOKPROBE_SYMBOL(trampoline_handler);
-
int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
{
struct kprobe *cur = kprobe_running();
diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
new file mode 100644
index 000000000000..56275540eeea
--- /dev/null
+++ b/arch/x86/kernel/rethook.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * x86 implementation of rethook. Mostly copied from arch/x86/kernel/kprobes/core.c.
+ */
+#include <linux/bug.h>
+#include <linux/rethook.h>
+#include <linux/kprobes.h>
+#include <linux/objtool.h>
+
+#include "kprobes/common.h"
+
+__visible void arch_rethook_trampoline_callback(struct pt_regs *regs);
+
+/*
+ * When a target function returns, this code saves registers and calls
+ * arch_rethook_trampoline_callback(), which calls the rethook handler.
+ */
+asm(
+ ".text\n"
+ ".global arch_rethook_trampoline\n"
+ ".type arch_rethook_trampoline, @function\n"
+ "arch_rethook_trampoline:\n"
+#ifdef CONFIG_X86_64
+ ANNOTATE_NOENDBR /* This is only jumped from ret instruction */
+ /* Push a fake return address to tell the unwinder it's a rethook. */
+ " pushq $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 8', this will be fixed later. */
+ " pushq %rsp\n"
+ " pushfq\n"
+ SAVE_REGS_STRING
+ " movq %rsp, %rdi\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addq $8, %rsp\n"
+ " popfq\n"
+#else
+ /* Push a fake return address to tell the unwinder it's a rethook. */
+ " pushl $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 4', this will be fixed later. */
+ " pushl %esp\n"
+ " pushfl\n"
+ SAVE_REGS_STRING
+ " movl %esp, %eax\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addl $4, %esp\n"
+ " popfl\n"
+#endif
+ ASM_RET
+ ".size arch_rethook_trampoline, .-arch_rethook_trampoline\n"
+);
+NOKPROBE_SYMBOL(arch_rethook_trampoline);
+
+/*
+ * Called from arch_rethook_trampoline
+ */
+__used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
+{
+ unsigned long *frame_pointer;
+
+ /* fixup registers */
+ regs->cs = __KERNEL_CS;
+#ifdef CONFIG_X86_32
+ regs->gs = 0;
+#endif
+ regs->ip = (unsigned long)&arch_rethook_trampoline;
+ regs->orig_ax = ~0UL;
+ regs->sp += sizeof(long);
+ frame_pointer = ®s->sp + 1;
+
+ /*
+ * The return address at 'frame_pointer' is recovered by the
+ * arch_rethook_fixup_return() which called from this
+ * rethook_trampoline_handler().
+ */
+ rethook_trampoline_handler(regs, (unsigned long)frame_pointer);
+
+ /*
+ * Copy FLAGS to 'pt_regs::sp' so that arch_rethook_trapmoline()
+ * can do RET right after POPF.
+ */
+ regs->sp = regs->flags;
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+
+/*
+ * arch_rethook_trampoline() skips updating frame pointer. The frame pointer
+ * saved in arch_rethook_trampoline_callback() points to the real caller
+ * function's frame pointer. Thus the arch_rethook_trampoline() doesn't have
+ * a standard stack frame with CONFIG_FRAME_POINTER=y.
+ * Let's mark it non-standard function. Anyway, FP unwinder can correctly
+ * unwind without the hint.
+ */
+STACK_FRAME_NON_STANDARD_FP(arch_rethook_trampoline);
+
+/* This is called from rethook_trampoline_handler(). */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ unsigned long *frame_pointer = ®s->sp + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = correct_ret_addr;
+}
+NOKPROBE_SYMBOL(arch_rethook_fixup_return);
+
+void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs, bool mcount)
+{
+ unsigned long *stack = (unsigned long *)regs->sp;
+
+ rh->ret_addr = stack[0];
+ rh->frame = regs->sp;
+
+ /* Replace the return addr with trampoline addr */
+ stack[0] = (unsigned long) arch_rethook_trampoline;
+}
+NOKPROBE_SYMBOL(arch_rethook_prepare);
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 2de3c8c5eba9..794fdef2501a 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -550,15 +550,15 @@ bool unwind_next_frame(struct unwind_state *state)
}
/*
* There is a small chance to interrupt at the entry of
- * __kretprobe_trampoline() where the ORC info doesn't exist.
- * That point is right after the RET to __kretprobe_trampoline()
+ * arch_rethook_trampoline() where the ORC info doesn't exist.
+ * That point is right after the RET to arch_rethook_trampoline()
* which was modified return address.
- * At that point, the @addr_p of the unwind_recover_kretprobe()
+ * At that point, the @addr_p of the unwind_recover_rethook()
* (this has to point the address of the stack entry storing
* the modified return address) must be "SP - (a stack entry)"
* because SP is incremented by the RET.
*/
- state->ip = unwind_recover_kretprobe(state, state->ip,
+ state->ip = unwind_recover_rethook(state, state->ip,
(unsigned long *)(state->sp - sizeof(long)));
state->regs = (struct pt_regs *)sp;
state->prev_regs = NULL;
@@ -573,7 +573,7 @@ bool unwind_next_frame(struct unwind_state *state)
goto err;
}
/* See UNWIND_HINT_TYPE_REGS case comment. */
- state->ip = unwind_recover_kretprobe(state, state->ip,
+ state->ip = unwind_recover_rethook(state, state->ip,
(unsigned long *)(state->sp - sizeof(long)));
if (state->full_regs)
Use rethook for kretprobe function return hooking if the arch sets
CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is
set to 'y' automatically, and the kretprobe internal data fields
switches to use rethook. If not, it continues to use kretprobe
specific function return hooks.
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/Kconfig | 7 ++
include/linux/kprobes.h | 51 +++++++++++++++++-
kernel/kprobes.c | 124 ++++++++++++++++++++++++++++++++++++-------
kernel/trace/trace_kprobe.c | 4 +
4 files changed, 161 insertions(+), 25 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index e12a4268c01d..9570854c4683 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -164,7 +164,12 @@ config ARCH_USE_BUILTIN_BSWAP
config KRETPROBES
def_bool y
- depends on KPROBES && HAVE_KRETPROBES
+ depends on KPROBES && (HAVE_KRETPROBES || HAVE_RETHOOK)
+
+config KRETPROBE_ON_RETHOOK
+ def_bool y
+ depends on HAVE_RETHOOK
+ select RETHOOK
config USER_RETURN_NOTIFIER
bool
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 312ff997c743..157168769fc2 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -28,6 +28,7 @@
#include <linux/ftrace.h>
#include <linux/refcount.h>
#include <linux/freelist.h>
+#include <linux/rethook.h>
#include <asm/kprobes.h>
#ifdef CONFIG_KPROBES
@@ -149,13 +150,20 @@ struct kretprobe {
int maxactive;
int nmissed;
size_t data_size;
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+ struct rethook *rh;
+#else
struct freelist_head freelist;
struct kretprobe_holder *rph;
+#endif
};
#define KRETPROBE_MAX_DATA_SIZE 4096
struct kretprobe_instance {
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+ struct rethook_node node;
+#else
union {
struct freelist_node freelist;
struct rcu_head rcu;
@@ -164,6 +172,7 @@ struct kretprobe_instance {
struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
void *fp;
+#endif
char data[];
};
@@ -186,10 +195,24 @@ extern void kprobe_busy_begin(void);
extern void kprobe_busy_end(void);
#ifdef CONFIG_KRETPROBES
-extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs);
+/* Check whether @p is used for implementing a trampoline. */
extern int arch_trampoline_kprobe(struct kprobe *p);
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri)
+{
+ RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
+ "Kretprobe is accessed from instance under preemptive context");
+
+ return (struct kretprobe *)READ_ONCE(ri->node.rethook->data);
+}
+static nokprobe_inline unsigned long get_kretprobe_retaddr(struct kretprobe_instance *ri)
+{
+ return ri->node.ret_addr;
+}
+#else
+extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
+ struct pt_regs *regs);
void arch_kretprobe_fixup_return(struct pt_regs *regs,
kprobe_opcode_t *correct_ret_addr);
@@ -232,6 +255,12 @@ static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance
return READ_ONCE(ri->rph->rp);
}
+static nokprobe_inline unsigned long get_kretprobe_retaddr(struct kretprobe_instance *ri)
+{
+ return (unsigned long)ri->ret_addr;
+}
+#endif /* CONFIG_KRETPROBE_ON_RETHOOK */
+
#else /* !CONFIG_KRETPROBES */
static inline void arch_prepare_kretprobe(struct kretprobe *rp,
struct pt_regs *regs)
@@ -395,7 +424,11 @@ void unregister_kretprobe(struct kretprobe *rp);
int register_kretprobes(struct kretprobe **rps, int num);
void unregister_kretprobes(struct kretprobe **rps, int num);
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+#define kprobe_flush_task(tk) do {} while (0)
+#else
void kprobe_flush_task(struct task_struct *tk);
+#endif
void kprobe_free_init_mem(void);
@@ -509,6 +542,19 @@ static inline bool is_kprobe_optinsn_slot(unsigned long addr)
#endif /* !CONFIG_OPTPROBES */
#ifdef CONFIG_KRETPROBES
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
+{
+ return is_rethook_trampoline(addr);
+}
+
+static nokprobe_inline
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+ struct llist_node **cur)
+{
+ return rethook_find_ret_addr(tsk, (unsigned long)fp, cur);
+}
+#else
static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
{
return (void *)addr == kretprobe_trampoline_addr();
@@ -516,6 +562,7 @@ static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
struct llist_node **cur);
+#endif
#else
static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
{
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 185badc780b7..dbe57df2e199 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1237,6 +1237,27 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
}
NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);
+static struct kprobe kprobe_busy = {
+ .addr = (void *) get_kprobe,
+};
+
+void kprobe_busy_begin(void)
+{
+ struct kprobe_ctlblk *kcb;
+
+ preempt_disable();
+ __this_cpu_write(current_kprobe, &kprobe_busy);
+ kcb = get_kprobe_ctlblk();
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+}
+
+void kprobe_busy_end(void)
+{
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+}
+
+#if !defined(CONFIG_KRETPROBE_ON_RETHOOK)
static void free_rp_inst_rcu(struct rcu_head *head)
{
struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
@@ -1258,26 +1279,6 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
}
NOKPROBE_SYMBOL(recycle_rp_inst);
-static struct kprobe kprobe_busy = {
- .addr = (void *) get_kprobe,
-};
-
-void kprobe_busy_begin(void)
-{
- struct kprobe_ctlblk *kcb;
-
- preempt_disable();
- __this_cpu_write(current_kprobe, &kprobe_busy);
- kcb = get_kprobe_ctlblk();
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
-}
-
-void kprobe_busy_end(void)
-{
- __this_cpu_write(current_kprobe, NULL);
- preempt_enable();
-}
-
/*
* This function is called from delayed_put_task_struct() when a task is
* dead and cleaned up to recycle any kretprobe instances associated with
@@ -1327,6 +1328,7 @@ static inline void free_rp_inst(struct kretprobe *rp)
rp->rph = NULL;
}
}
+#endif /* !CONFIG_KRETPROBE_ON_RETHOOK */
/* Add the new probe to 'ap->list'. */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
@@ -1925,6 +1927,7 @@ static struct notifier_block kprobe_exceptions_nb = {
#ifdef CONFIG_KRETPROBES
+#if !defined(CONFIG_KRETPROBE_ON_RETHOOK)
/* This assumes the 'tsk' is the current task or the is not running. */
static kprobe_opcode_t *__kretprobe_find_ret_addr(struct task_struct *tsk,
struct llist_node **cur)
@@ -2087,6 +2090,57 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
return 0;
}
NOKPROBE_SYMBOL(pre_handler_kretprobe);
+#else /* CONFIG_KRETPROBE_ON_RETHOOK */
+/*
+ * This kprobe pre_handler is registered with every kretprobe. When probe
+ * hits it will set up the return probe.
+ */
+static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
+{
+ struct kretprobe *rp = container_of(p, struct kretprobe, kp);
+ struct kretprobe_instance *ri;
+ struct rethook_node *rhn;
+
+ rhn = rethook_try_get(rp->rh);
+ if (!rhn) {
+ rp->nmissed++;
+ return 0;
+ }
+
+ ri = container_of(rhn, struct kretprobe_instance, node);
+
+ if (rp->entry_handler && rp->entry_handler(ri, regs))
+ rethook_recycle(rhn);
+ else
+ rethook_hook(rhn, regs, kprobe_ftrace(p));
+
+ return 0;
+}
+NOKPROBE_SYMBOL(pre_handler_kretprobe);
+
+static void kretprobe_rethook_handler(struct rethook_node *rh, void *data,
+ struct pt_regs *regs)
+{
+ struct kretprobe *rp = (struct kretprobe *)data;
+ struct kretprobe_instance *ri;
+ struct kprobe_ctlblk *kcb;
+
+ /* The data must NOT be null. This means rethook data structure is broken. */
+ if (WARN_ON_ONCE(!data))
+ return;
+
+ __this_cpu_write(current_kprobe, &rp->kp);
+ kcb = get_kprobe_ctlblk();
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
+ ri = container_of(rh, struct kretprobe_instance, node);
+ rp->handler(ri, regs);
+
+ __this_cpu_write(current_kprobe, NULL);
+}
+NOKPROBE_SYMBOL(kretprobe_rethook_handler);
+
+#endif /* !CONFIG_KRETPROBE_ON_RETHOOK */
/**
* kprobe_on_func_entry() -- check whether given address is function entry
@@ -2155,6 +2209,29 @@ int register_kretprobe(struct kretprobe *rp)
rp->maxactive = num_possible_cpus();
#endif
}
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+ rp->rh = rethook_alloc((void *)rp, kretprobe_rethook_handler);
+ if (!rp->rh)
+ return -ENOMEM;
+
+ for (i = 0; i < rp->maxactive; i++) {
+ inst = kzalloc(sizeof(struct kretprobe_instance) +
+ rp->data_size, GFP_KERNEL);
+ if (inst == NULL) {
+ rethook_free(rp->rh);
+ rp->rh = NULL;
+ return -ENOMEM;
+ }
+ rethook_add_node(rp->rh, &inst->node);
+ }
+ rp->nmissed = 0;
+ /* Establish function entry probe point */
+ ret = register_kprobe(&rp->kp);
+ if (ret != 0) {
+ rethook_free(rp->rh);
+ rp->rh = NULL;
+ }
+#else /* !CONFIG_KRETPROBE_ON_RETHOOK */
rp->freelist.head = NULL;
rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
if (!rp->rph)
@@ -2179,6 +2256,7 @@ int register_kretprobe(struct kretprobe *rp)
ret = register_kprobe(&rp->kp);
if (ret != 0)
free_rp_inst(rp);
+#endif
return ret;
}
EXPORT_SYMBOL_GPL(register_kretprobe);
@@ -2217,7 +2295,11 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
for (i = 0; i < num; i++) {
if (__unregister_kprobe_top(&rps[i]->kp) < 0)
rps[i]->kp.addr = NULL;
+#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+ rethook_free(rps[i]->rh);
+#else
rps[i]->rph->rp = NULL;
+#endif
}
mutex_unlock(&kprobe_mutex);
@@ -2225,7 +2307,9 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
for (i = 0; i < num; i++) {
if (rps[i]->kp.addr) {
__unregister_kprobe_bottom(&rps[i]->kp);
+#ifndef CONFIG_KRETPROBE_ON_RETHOOK
free_rp_inst(rps[i]);
+#endif
}
}
}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b62fd785b599..47cebef78532 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1433,7 +1433,7 @@ __kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
fbuffer.regs = regs;
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
entry->func = (unsigned long)tk->rp.kp.addr;
- entry->ret_ip = (unsigned long)ri->ret_addr;
+ entry->ret_ip = get_kretprobe_retaddr(ri);
store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize);
trace_event_buffer_commit(&fbuffer);
@@ -1628,7 +1628,7 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
return;
entry->func = (unsigned long)tk->rp.kp.addr;
- entry->ret_ip = (unsigned long)ri->ret_addr;
+ entry->ret_ip = get_kretprobe_retaddr(ri);
store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize);
perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
head, NULL);
On Fri, Mar 25, 2022 at 11:22:53PM +0900, Masami Hiramatsu wrote:
> Masami Hiramatsu (3):
> kprobes: Use rethook for kretprobe if possible
> rethook: kprobes: x86: Replace kretprobe with rethook on x86
> x86,kprobes: Fix optprobe trampoline to generate complete pt_regs
>
> Peter Zijlstra (1):
> Subject: x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs
You fat-fingered the subject there ^
Other than that:
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Hopefully the ftrace return trampoline can also be switched over..
From: Peter Zijlstra <[email protected]>
Currently arch_rethook_trampoline() generates an almost complete
pt_regs on-stack, everything except regs->ss that is, that currently
points to the fake return address, which is not a valid segment
descriptor.
Since interpretation of regs->[sb]p should be done in the context of
regs->ss, and we have code actually doing that (see
arch/x86/lib/insn-eval.c for instance), complete the job by also
pushing ss.
This ensures that anybody who does do look at regs->ss doesn't
mysteriously malfunction, avoiding much future pain.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
---
arch/x86/kernel/rethook.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
index 56275540eeea..b080c020d528 100644
--- a/arch/x86/kernel/rethook.c
+++ b/arch/x86/kernel/rethook.c
@@ -25,29 +25,31 @@ asm(
/* Push a fake return address to tell the unwinder it's a rethook. */
" pushq $arch_rethook_trampoline\n"
UNWIND_HINT_FUNC
- /* Save the 'sp - 8', this will be fixed later. */
+ " pushq $" __stringify(__KERNEL_DS) "\n"
+ /* Save the 'sp - 16', this will be fixed later. */
" pushq %rsp\n"
" pushfq\n"
SAVE_REGS_STRING
" movq %rsp, %rdi\n"
" call arch_rethook_trampoline_callback\n"
RESTORE_REGS_STRING
- /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
- " addq $8, %rsp\n"
+ /* In the callback function, 'regs->flags' is copied to 'regs->ss'. */
+ " addq $16, %rsp\n"
" popfq\n"
#else
/* Push a fake return address to tell the unwinder it's a rethook. */
" pushl $arch_rethook_trampoline\n"
UNWIND_HINT_FUNC
- /* Save the 'sp - 4', this will be fixed later. */
+ " pushl %ss\n"
+ /* Save the 'sp - 8', this will be fixed later. */
" pushl %esp\n"
" pushfl\n"
SAVE_REGS_STRING
" movl %esp, %eax\n"
" call arch_rethook_trampoline_callback\n"
RESTORE_REGS_STRING
- /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
- " addl $4, %esp\n"
+ /* In the callback function, 'regs->flags' is copied to 'regs->ss'. */
+ " addl $8, %esp\n"
" popfl\n"
#endif
ASM_RET
@@ -69,8 +71,8 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
#endif
regs->ip = (unsigned long)&arch_rethook_trampoline;
regs->orig_ax = ~0UL;
- regs->sp += sizeof(long);
- frame_pointer = ®s->sp + 1;
+ regs->sp += 2*sizeof(long);
+ frame_pointer = (long *)(regs + 1);
/*
* The return address at 'frame_pointer' is recovered by the
@@ -80,10 +82,10 @@ __used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
rethook_trampoline_handler(regs, (unsigned long)frame_pointer);
/*
- * Copy FLAGS to 'pt_regs::sp' so that arch_rethook_trapmoline()
+ * Copy FLAGS to 'pt_regs::ss' so that arch_rethook_trapmoline()
* can do RET right after POPF.
*/
- regs->sp = regs->flags;
+ *(unsigned long *)®s->ss = regs->flags;
}
NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
@@ -101,7 +103,7 @@ STACK_FRAME_NON_STANDARD_FP(arch_rethook_trampoline);
void arch_rethook_fixup_return(struct pt_regs *regs,
unsigned long correct_ret_addr)
{
- unsigned long *frame_pointer = ®s->sp + 1;
+ unsigned long *frame_pointer = (void *)(regs + 1);
/* Replace fake return address with real one. */
*frame_pointer = correct_ret_addr;
On Fri, 25 Mar 2022 09:49:47 -0700
Alexei Starovoitov <[email protected]> wrote:
> On Fri, Mar 25, 2022 at 7:43 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Fri, Mar 25, 2022 at 11:22:53PM +0900, Masami Hiramatsu wrote:
> >
> > > Masami Hiramatsu (3):
> > > kprobes: Use rethook for kretprobe if possible
> > > rethook: kprobes: x86: Replace kretprobe with rethook on x86
> > > x86,kprobes: Fix optprobe trampoline to generate complete pt_regs
> > >
> > > Peter Zijlstra (1):
> > > Subject: x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs
> >
> > You fat-fingered the subject there ^
> >
> > Other than that:
> >
> > Acked-by: Peter Zijlstra (Intel) <[email protected]>
> >
> > Hopefully the ftrace return trampoline can also be switched over..
>
> Thanks Peter. What's an ETA on landing endbr set?
> Did I miss a pull req?
> I see an odd error in linux-next with bpf selftests
> which may or may not be related. Planning to debug it
> when everything settles in Linus's tree.
That is what I pointed in cover mail.
> BTW, this patch can be applied to next-20220324, not the bpf-next tree
> directly, because this depends on ANNOTATE_NOENDBR macro. However, since
> the fprobe is merged in the bpf-next, I marked this for bpf-next.
> So until merging the both of fprobes and ENDBR series, to compile this
> you need below 2 lines in arch/x86/kernel/rethook.c.
>
> #ifndef ANNOTATE_NOENDBR
> #define ANNOTATE_NOENDBR
>
> Masami, could you do another respin?
OK, I will add above temporary mitigation.
>
> Also do you mind squashing patches 2,3,4 ?
> It's odd to have the same lines of code patched up 3 times.
> Just do it right once.
Hmm, I think those are different commit for different features.
I would like to keep those 3 patches separated (for the case if
we find any issue to introduce regs->ss later)
Thank you,
--
Masami Hiramatsu <[email protected]>
On Fri, 25 Mar 2022 15:43:14 +0100
Peter Zijlstra <[email protected]> wrote:
> On Fri, Mar 25, 2022 at 11:22:53PM +0900, Masami Hiramatsu wrote:
>
> > Masami Hiramatsu (3):
> > kprobes: Use rethook for kretprobe if possible
> > rethook: kprobes: x86: Replace kretprobe with rethook on x86
> > x86,kprobes: Fix optprobe trampoline to generate complete pt_regs
> >
> > Peter Zijlstra (1):
> > Subject: x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs
>
> You fat-fingered the subject there ^
>
Oops, I missed to import the patch...
> Other than that:
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
Thanks!
>
> Hopefully the ftrace return trampoline can also be switched over..
The rethook clarifies the interfaces for the return trampoline, so
I think this can step the integration forward.
--
Masami Hiramatsu <[email protected]>