hi,
as part of the effort on speeding up the uprobes [0] coming with
return uprobe optimization by using syscall instead of the trap
on the uretprobe trampoline.
The speed up depends on instruction type that uprobe is installed
and depends on specific HW type, please check patch 1 for details.
Patches 1-8 are based on bpf-next/master, but patch 2 and 3 are
apply-able on linux-trace.git tree probes/for-next branch.
Patch 9 is based on man-pages master.
v6 changes:
- separate shadow stack fix for current uretprobe in patch 1
- skip shadow stack test when uprobe is not compiled int [Masami]
- fix retprobe with the shadow stack, using iret return when
shadow stack is detected
- I kept the acks on patch 3, because the shadow stack change is
minimal and the original code is almost untouched
- added shadow stack bpf selftest
- rebased man page
Also available at:
https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
uretprobe_syscall
thanks,
jirka
Notes to check list items in Documentation/process/adding-syscalls.rst:
- System Call Alternatives
New syscall seems like the best way in here, because we need
just to quickly enter kernel with no extra arguments processing,
which we'd need to do if we decided to use another syscall.
- Designing the API: Planning for Extension
The uretprobe syscall is very specific and most likely won't be
extended in the future.
At the moment it does not take any arguments and even if it does
in future, it's allowed to be called only from trampoline prepared
by kernel, so there'll be no broken user.
- Designing the API: Other Considerations
N/A because uretprobe syscall does not return reference to kernel
object.
- Proposing the API
Wiring up of the uretprobe system call is in separate change,
selftests and man page changes are part of the patchset.
- Generic System Call Implementation
There's no CONFIG option for the new functionality because it
keeps the same behaviour from the user POV.
- x86 System Call Implementation
It's 64-bit syscall only.
- Compatibility System Calls (Generic)
N/A uretprobe syscall has no arguments and is not supported
for compat processes.
- Compatibility System Calls (x86)
N/A uretprobe syscall is not supported for compat processes.
- System Calls Returning Elsewhere
N/A.
- Other Details
N/A.
- Testing
Adding new bpf selftests and ran ltp on top of this change.
- Man Page
Attached.
- Do not call System Calls in the Kernel
N/A.
[0] https://lore.kernel.org/bpf/ZeCXHKJ--iYYbmLj@krava/
---
Jiri Olsa (8):
x86/shstk: Make return uprobe work with shadow stack
uprobe: Wire up uretprobe system call
uprobe: Add uretprobe syscall to speed up return probe
selftests/x86: Add return uprobe shadow stack test
selftests/bpf: Add uretprobe syscall test for regs integrity
selftests/bpf: Add uretprobe syscall test for regs changes
selftests/bpf: Add uretprobe syscall call from user space test
selftests/bpf: Add uretprobe shadow stack test
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/shstk.h | 4 +
arch/x86/kernel/shstk.c | 16 ++++
arch/x86/kernel/uprobes.c | 124 ++++++++++++++++++++++++++++-
include/linux/syscalls.h | 2 +
include/linux/uprobes.h | 3 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/events/uprobes.c | 24 ++++--
kernel/sys_ni.c | 2 +
tools/include/linux/compiler.h | 4 +
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c | 123 ++++++++++++++++++++++++++++-
tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 385 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/uprobe_syscall.c | 15 ++++
tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c | 17 ++++
tools/testing/selftests/x86/test_shadow_stack.c | 145 ++++++++++++++++++++++++++++++++++
15 files changed, 860 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
Jiri Olsa (1):
man2: Add uretprobe syscall page
man/man2/uretprobe.2 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
create mode 100644 man/man2/uretprobe.2
Currently the application with enabled shadow stack will crash
if it sets up return uprobe. The reason is the uretprobe kernel
code changes the user space task's stack, but does not update
shadow stack accordingly.
Adding new functions to update values on shadow stack and using
them in uprobe code to keep shadow stack in sync with uretprobe
changes to user stack.
Fixes: 8b1c23543436 ("x86/shstk: Add return uprobe support")
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/include/asm/shstk.h | 2 ++
arch/x86/kernel/shstk.c | 11 +++++++++++
arch/x86/kernel/uprobes.c | 7 ++++++-
3 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 42fee8959df7..896909f306e3 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -21,6 +21,7 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clon
void shstk_free(struct task_struct *p);
int setup_signal_shadow_stack(struct ksignal *ksig);
int restore_signal_shadow_stack(void);
+int shstk_update_last_frame(unsigned long val);
#else
static inline long shstk_prctl(struct task_struct *task, int option,
unsigned long arg2) { return -EINVAL; }
@@ -31,6 +32,7 @@ static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
static inline void shstk_free(struct task_struct *p) {}
static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
static inline int restore_signal_shadow_stack(void) { return 0; }
+static inline int shstk_update_last_frame(unsigned long val) { return 0; }
#endif /* CONFIG_X86_USER_SHADOW_STACK */
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 6f1e9883f074..9797d4cdb78a 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -577,3 +577,14 @@ long shstk_prctl(struct task_struct *task, int option, unsigned long arg2)
return wrss_control(true);
return -EINVAL;
}
+
+int shstk_update_last_frame(unsigned long val)
+{
+ unsigned long ssp;
+
+ if (!features_enabled(ARCH_SHSTK_SHSTK))
+ return 0;
+
+ ssp = get_user_shstk_addr();
+ return write_user_shstk_64((u64 __user *)ssp, (u64)val);
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 6c07f6daaa22..6402fb3089d2 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -1076,8 +1076,13 @@ arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs
return orig_ret_vaddr;
nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
- if (likely(!nleft))
+ if (likely(!nleft)) {
+ if (shstk_update_last_frame(trampoline_vaddr)) {
+ force_sig(SIGSEGV);
+ return -1;
+ }
return orig_ret_vaddr;
+ }
if (nleft != rasize) {
pr_err("return address clobbered: pid=%d, %%sp=%#lx, %%ip=%#lx\n",
--
2.45.0
Wiring up uretprobe system call, which comes in following changes.
We need to do the wiring before, because the uretprobe implementation
needs the syscall number.
Note at the moment uretprobe syscall is supported only for native
64-bit process.
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/syscalls.h | 2 ++
include/uapi/asm-generic/unistd.h | 5 ++++-
kernel/sys_ni.c | 2 ++
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cc78226ffc35..47dfea0a827c 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -383,6 +383,7 @@
459 common lsm_get_self_attr sys_lsm_get_self_attr
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
+462 64 uretprobe sys_uretprobe
#
# Due to a historical design error, certain syscalls are numbered differently
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e619ac10cd23..5318e0e76799 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -972,6 +972,8 @@ asmlinkage long sys_lsm_list_modules(u64 *ids, u32 *size, u32 flags);
/* x86 */
asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
+asmlinkage long sys_uretprobe(void);
+
/* pciconfig: alpha, arm, arm64, ia64, sparc */
asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
unsigned long off, unsigned long len,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 75f00965ab15..8a747cd1d735 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -842,8 +842,11 @@ __SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
#define __NR_lsm_list_modules 461
__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
+#define __NR_uretprobe 462
+__SYSCALL(__NR_uretprobe, sys_uretprobe)
+
#undef __NR_syscalls
-#define __NR_syscalls 462
+#define __NR_syscalls 463
/*
* 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index faad00cce269..be6195e0d078 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -391,3 +391,5 @@ COND_SYSCALL(setuid16);
/* restartable sequence */
COND_SYSCALL(rseq);
+
+COND_SYSCALL(uretprobe);
--
2.45.0
Adding uretprobe syscall instead of trap to speed up return probe.
At the moment the uretprobe setup/path is:
- install entry uprobe
- when the uprobe is hit, it overwrites probed function's return address
on stack with address of the trampoline that contains breakpoint
instruction
- the breakpoint trap code handles the uretprobe consumers execution and
jumps back to original return address
This patch replaces the above trampoline's breakpoint instruction with new
ureprobe syscall call. This syscall does exactly the same job as the trap
with some more extra work:
- syscall trampoline must save original value for rax/r11/rcx registers
on stack - rax is set to syscall number and r11/rcx are changed and
used by syscall instruction
- the syscall code reads the original values of those registers and
restore those values in task's pt_regs area
- only caller from trampoline exposed in '[uprobes]' is allowed,
the process will receive SIGILL signal otherwise
Even with some extra work, using the uretprobes syscall shows speed
improvement (compared to using standard breakpoint):
On Intel (11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz)
current:
uretprobe-nop : 1.498 ± 0.000M/s
uretprobe-push : 1.448 ± 0.001M/s
uretprobe-ret : 0.816 ± 0.001M/s
with the fix:
uretprobe-nop : 1.969 ± 0.002M/s < 31% speed up
uretprobe-push : 1.910 ± 0.000M/s < 31% speed up
uretprobe-ret : 0.934 ± 0.000M/s < 14% speed up
On Amd (AMD Ryzen 7 5700U)
current:
uretprobe-nop : 0.778 ± 0.001M/s
uretprobe-push : 0.744 ± 0.001M/s
uretprobe-ret : 0.540 ± 0.001M/s
with the fix:
uretprobe-nop : 0.860 ± 0.001M/s < 10% speed up
uretprobe-push : 0.818 ± 0.001M/s < 10% speed up
uretprobe-ret : 0.578 ± 0.000M/s < 7% speed up
The performance test spawns a thread that runs loop which triggers
uprobe with attached bpf program that increments the counter that
gets printed in results above.
The uprobe (and uretprobe) kind is determined by which instruction
is being patched with breakpoint instruction. That's also important
for uretprobes, because uprobe is installed for each uretprobe.
The performance test is part of bpf selftests:
tools/testing/selftests/bpf/run_bench_uprobes.sh
Note at the moment uretprobe syscall is supported only for native
64-bit process, compat process still uses standard breakpoint.
Note that when shadow stack is enabled the uretprobe syscall returns
via iret, which is slower than return via sysret, but won't cause the
shadow stack violation.
Suggested-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/include/asm/shstk.h | 2 +
arch/x86/kernel/shstk.c | 5 ++
arch/x86/kernel/uprobes.c | 117 +++++++++++++++++++++++++++++++++++
include/linux/uprobes.h | 3 +
kernel/events/uprobes.c | 24 ++++---
5 files changed, 144 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 896909f306e3..4cb77e004615 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -22,6 +22,7 @@ void shstk_free(struct task_struct *p);
int setup_signal_shadow_stack(struct ksignal *ksig);
int restore_signal_shadow_stack(void);
int shstk_update_last_frame(unsigned long val);
+bool shstk_is_enabled(void);
#else
static inline long shstk_prctl(struct task_struct *task, int option,
unsigned long arg2) { return -EINVAL; }
@@ -33,6 +34,7 @@ static inline void shstk_free(struct task_struct *p) {}
static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
static inline int restore_signal_shadow_stack(void) { return 0; }
static inline int shstk_update_last_frame(unsigned long val) { return 0; }
+static inline bool shstk_is_enabled(void) { return false; }
#endif /* CONFIG_X86_USER_SHADOW_STACK */
#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 9797d4cdb78a..059685612362 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -588,3 +588,8 @@ int shstk_update_last_frame(unsigned long val)
ssp = get_user_shstk_addr();
return write_user_shstk_64((u64 __user *)ssp, (u64)val);
}
+
+bool shstk_is_enabled(void)
+{
+ return features_enabled(ARCH_SHSTK_SHSTK);
+}
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 6402fb3089d2..5a952c5ea66b 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -12,6 +12,7 @@
#include <linux/ptrace.h>
#include <linux/uprobes.h>
#include <linux/uaccess.h>
+#include <linux/syscalls.h>
#include <linux/kdebug.h>
#include <asm/processor.h>
@@ -308,6 +309,122 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
}
#ifdef CONFIG_X86_64
+
+asm (
+ ".pushsection .rodata\n"
+ ".global uretprobe_trampoline_entry\n"
+ "uretprobe_trampoline_entry:\n"
+ "pushq %rax\n"
+ "pushq %rcx\n"
+ "pushq %r11\n"
+ "movq $" __stringify(__NR_uretprobe) ", %rax\n"
+ "syscall\n"
+ ".global uretprobe_syscall_check\n"
+ "uretprobe_syscall_check:\n"
+ "popq %r11\n"
+ "popq %rcx\n"
+
+ /* The uretprobe syscall replaces stored %rax value with final
+ * return address, so we don't restore %rax in here and just
+ * call ret.
+ */
+ "retq\n"
+ ".global uretprobe_trampoline_end\n"
+ "uretprobe_trampoline_end:\n"
+ ".popsection\n"
+);
+
+extern u8 uretprobe_trampoline_entry[];
+extern u8 uretprobe_trampoline_end[];
+extern u8 uretprobe_syscall_check[];
+
+void *arch_uprobe_trampoline(unsigned long *psize)
+{
+ static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+ struct pt_regs *regs = task_pt_regs(current);
+
+ /*
+ * At the moment the uretprobe syscall trampoline is supported
+ * only for native 64-bit process, the compat process still uses
+ * standard breakpoint.
+ */
+ if (user_64bit_mode(regs)) {
+ *psize = uretprobe_trampoline_end - uretprobe_trampoline_entry;
+ return uretprobe_trampoline_entry;
+ }
+
+ *psize = UPROBE_SWBP_INSN_SIZE;
+ return &insn;
+}
+
+static unsigned long trampoline_check_ip(void)
+{
+ unsigned long tramp = uprobe_get_trampoline_vaddr();
+
+ return tramp + (uretprobe_syscall_check - uretprobe_trampoline_entry);
+}
+
+SYSCALL_DEFINE0(uretprobe)
+{
+ struct pt_regs *regs = task_pt_regs(current);
+ unsigned long err, ip, sp, r11_cx_ax[3];
+
+ if (regs->ip != trampoline_check_ip())
+ goto sigill;
+
+ err = copy_from_user(r11_cx_ax, (void __user *)regs->sp, sizeof(r11_cx_ax));
+ if (err)
+ goto sigill;
+
+ /* expose the "right" values of r11/cx/ax/sp to uprobe_consumer/s */
+ regs->r11 = r11_cx_ax[0];
+ regs->cx = r11_cx_ax[1];
+ regs->ax = r11_cx_ax[2];
+ regs->sp += sizeof(r11_cx_ax);
+ regs->orig_ax = -1;
+
+ ip = regs->ip;
+ sp = regs->sp;
+
+ uprobe_handle_trampoline(regs);
+
+ /*
+ * Some of the uprobe consumers has changed sp, we can do nothing,
+ * just return via iret.
+ * .. or shadow stack is enabled, in which case we need to skip
+ * return through the user space stack address.
+ */
+ if (regs->sp != sp || shstk_is_enabled())
+ return regs->ax;
+ regs->sp -= sizeof(r11_cx_ax);
+
+ /* for the case uprobe_consumer has changed r11/cx */
+ r11_cx_ax[0] = regs->r11;
+ r11_cx_ax[1] = regs->cx;
+
+ /*
+ * ax register is passed through as return value, so we can use
+ * its space on stack for ip value and jump to it through the
+ * trampoline's ret instruction
+ */
+ r11_cx_ax[2] = regs->ip;
+ regs->ip = ip;
+
+ err = copy_to_user((void __user *)regs->sp, r11_cx_ax, sizeof(r11_cx_ax));
+ if (err)
+ goto sigill;
+
+ /* ensure sysret, see do_syscall_64() */
+ regs->r11 = regs->flags;
+ regs->cx = regs->ip;
+
+ return regs->ax;
+
+sigill:
+ force_sig(SIGILL);
+ return -1;
+}
+
/*
* If arch_uprobe->insn doesn't use rip-relative addressing, return
* immediately. Otherwise, rewrite the instruction so that it accesses
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..b503fafb7fb3 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -138,6 +138,9 @@ extern bool arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c
extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
void *src, unsigned long len);
+extern void uprobe_handle_trampoline(struct pt_regs *regs);
+extern void *arch_uprobe_trampoline(unsigned long *psize);
+extern unsigned long uprobe_get_trampoline_vaddr(void);
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..2816e65729ac 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1474,11 +1474,20 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
return ret;
}
+void * __weak arch_uprobe_trampoline(unsigned long *psize)
+{
+ static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+
+ *psize = UPROBE_SWBP_INSN_SIZE;
+ return &insn;
+}
+
static struct xol_area *__create_xol_area(unsigned long vaddr)
{
struct mm_struct *mm = current->mm;
- uprobe_opcode_t insn = UPROBE_SWBP_INSN;
+ unsigned long insns_size;
struct xol_area *area;
+ void *insns;
area = kmalloc(sizeof(*area), GFP_KERNEL);
if (unlikely(!area))
@@ -1502,7 +1511,8 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
/* Reserve the 1st slot for get_trampoline_vaddr() */
set_bit(0, area->bitmap);
atomic_set(&area->slot_count, 1);
- arch_uprobe_copy_ixol(area->pages[0], 0, &insn, UPROBE_SWBP_INSN_SIZE);
+ insns = arch_uprobe_trampoline(&insns_size);
+ arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
if (!xol_add_vma(mm, area))
return area;
@@ -1827,7 +1837,7 @@ void uprobe_copy_process(struct task_struct *t, unsigned long flags)
*
* Returns -1 in case the xol_area is not allocated.
*/
-static unsigned long get_trampoline_vaddr(void)
+unsigned long uprobe_get_trampoline_vaddr(void)
{
struct xol_area *area;
unsigned long trampoline_vaddr = -1;
@@ -1878,7 +1888,7 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
if (!ri)
return;
- trampoline_vaddr = get_trampoline_vaddr();
+ trampoline_vaddr = uprobe_get_trampoline_vaddr();
orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
if (orig_ret_vaddr == -1)
goto fail;
@@ -2123,7 +2133,7 @@ static struct return_instance *find_next_ret_chain(struct return_instance *ri)
return ri;
}
-static void handle_trampoline(struct pt_regs *regs)
+void uprobe_handle_trampoline(struct pt_regs *regs)
{
struct uprobe_task *utask;
struct return_instance *ri, *next;
@@ -2187,8 +2197,8 @@ static void handle_swbp(struct pt_regs *regs)
int is_swbp;
bp_vaddr = uprobe_get_swbp_addr(regs);
- if (bp_vaddr == get_trampoline_vaddr())
- return handle_trampoline(regs);
+ if (bp_vaddr == uprobe_get_trampoline_vaddr())
+ return uprobe_handle_trampoline(regs);
uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
if (!uprobe) {
--
2.45.0
Adding return uprobe test for shadow stack and making sure it's
working properly. Borrowed some of the code from bpf selftests.
Signed-off-by: Jiri Olsa <[email protected]>
---
.../testing/selftests/x86/test_shadow_stack.c | 145 ++++++++++++++++++
1 file changed, 145 insertions(+)
diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c
index 757e6527f67e..e3501b7e2ecc 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -34,6 +34,7 @@
#include <sys/ptrace.h>
#include <sys/signal.h>
#include <linux/elf.h>
+#include <linux/perf_event.h>
/*
* Define the ABI defines if needed, so people can run the tests
@@ -681,6 +682,144 @@ int test_32bit(void)
return !segv_triggered;
}
+static int parse_uint_from_file(const char *file, const char *fmt)
+{
+ int err, ret;
+ FILE *f;
+
+ f = fopen(file, "re");
+ if (!f) {
+ err = -errno;
+ printf("failed to open '%s': %d\n", file, err);
+ return err;
+ }
+ err = fscanf(f, fmt, &ret);
+ if (err != 1) {
+ err = err == EOF ? -EIO : -errno;
+ printf("failed to parse '%s': %d\n", file, err);
+ fclose(f);
+ return err;
+ }
+ fclose(f);
+ return ret;
+}
+
+static int determine_uprobe_perf_type(void)
+{
+ const char *file = "/sys/bus/event_source/devices/uprobe/type";
+
+ return parse_uint_from_file(file, "%d\n");
+}
+
+static int determine_uprobe_retprobe_bit(void)
+{
+ const char *file = "/sys/bus/event_source/devices/uprobe/format/retprobe";
+
+ return parse_uint_from_file(file, "config:%d\n");
+}
+
+static ssize_t get_uprobe_offset(const void *addr)
+{
+ size_t start, end, base;
+ char buf[256];
+ bool found = false;
+ FILE *f;
+
+ f = fopen("/proc/self/maps", "r");
+ if (!f)
+ return -errno;
+
+ while (fscanf(f, "%zx-%zx %s %zx %*[^\n]\n", &start, &end, buf, &base) == 4) {
+ if (buf[2] == 'x' && (uintptr_t)addr >= start && (uintptr_t)addr < end) {
+ found = true;
+ break;
+ }
+ }
+
+ fclose(f);
+
+ if (!found)
+ return -ESRCH;
+
+ return (uintptr_t)addr - start + base;
+}
+
+static __attribute__((noinline)) void uretprobe_trigger(void)
+{
+ asm volatile ("");
+}
+
+/*
+ * This test setups return uprobe, which is sensitive to shadow stack
+ * (crashes without extra fix). After executing the uretprobe we fail
+ * the test if we receive SIGSEGV, no crash means we're good.
+ *
+ * Helper functions above borrowed from bpf selftests.
+ */
+static int test_uretprobe(void)
+{
+ const size_t attr_sz = sizeof(struct perf_event_attr);
+ const char *file = "/proc/self/exe";
+ int bit, fd = 0, type, err = 1;
+ struct perf_event_attr attr;
+ struct sigaction sa = {};
+ ssize_t offset;
+
+ type = determine_uprobe_perf_type();
+ if (type < 0) {
+ if (type == -ENOENT)
+ printf("[SKIP]\tUretprobe test, uprobes are not available\n");
+ return 0;
+ }
+
+ offset = get_uprobe_offset(uretprobe_trigger);
+ if (offset < 0)
+ return 1;
+
+ bit = determine_uprobe_retprobe_bit();
+ if (bit < 0)
+ return 1;
+
+ sa.sa_sigaction = segv_gp_handler;
+ sa.sa_flags = SA_SIGINFO;
+ if (sigaction(SIGSEGV, &sa, NULL))
+ return 1;
+
+ /* Setup return uprobe through perf event interface. */
+ memset(&attr, 0, attr_sz);
+ attr.size = attr_sz;
+ attr.type = type;
+ attr.config = 1 << bit;
+ attr.config1 = (__u64) (unsigned long) file;
+ attr.config2 = offset;
+
+ fd = syscall(__NR_perf_event_open, &attr, 0 /* pid */, -1 /* cpu */,
+ -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
+ if (fd < 0)
+ goto out;
+
+ if (sigsetjmp(jmp_buffer, 1))
+ goto out;
+
+ ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK);
+
+ /*
+ * This either segfaults and goes through sigsetjmp above
+ * or succeeds and we're good.
+ */
+ uretprobe_trigger();
+
+ printf("[OK]\tUretprobe test\n");
+ err = 0;
+
+out:
+ ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
+ signal(SIGSEGV, SIG_DFL);
+ if (fd)
+ close(fd);
+ return err;
+}
+
void segv_handler_ptrace(int signum, siginfo_t *si, void *uc)
{
/* The SSP adjustment caused a segfault. */
@@ -867,6 +1006,12 @@ int main(int argc, char *argv[])
goto out;
}
+ if (test_uretprobe()) {
+ ret = 1;
+ printf("[FAIL]\turetprobe test\n");
+ goto out;
+ }
+
return ret;
out:
--
2.45.0
Add uretprobe syscall test that compares register values before
and after the uretprobe is hit. It also compares the register
values seen from attached bpf program.
Acked-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/include/linux/compiler.h | 4 +
.../selftests/bpf/prog_tests/uprobe_syscall.c | 163 ++++++++++++++++++
.../selftests/bpf/progs/uprobe_syscall.c | 15 ++
3 files changed, 182 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall.c
diff --git a/tools/include/linux/compiler.h b/tools/include/linux/compiler.h
index 8a63a9913495..6f7f22ac9da5 100644
--- a/tools/include/linux/compiler.h
+++ b/tools/include/linux/compiler.h
@@ -62,6 +62,10 @@
#define __nocf_check __attribute__((nocf_check))
#endif
+#ifndef __naked
+#define __naked __attribute__((__naked__))
+#endif
+
/* Are two types/vars the same type (ignoring qualifiers)? */
#ifndef __same_type
# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
new file mode 100644
index 000000000000..311ac19d8992
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#ifdef __x86_64__
+
+#include <unistd.h>
+#include <asm/ptrace.h>
+#include <linux/compiler.h>
+#include "uprobe_syscall.skel.h"
+
+__naked unsigned long uretprobe_regs_trigger(void)
+{
+ asm volatile (
+ "movq $0xdeadbeef, %rax\n"
+ "ret\n"
+ );
+}
+
+__naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
+{
+ asm volatile (
+ "movq %r15, 0(%rdi)\n"
+ "movq %r14, 8(%rdi)\n"
+ "movq %r13, 16(%rdi)\n"
+ "movq %r12, 24(%rdi)\n"
+ "movq %rbp, 32(%rdi)\n"
+ "movq %rbx, 40(%rdi)\n"
+ "movq %r11, 48(%rdi)\n"
+ "movq %r10, 56(%rdi)\n"
+ "movq %r9, 64(%rdi)\n"
+ "movq %r8, 72(%rdi)\n"
+ "movq %rax, 80(%rdi)\n"
+ "movq %rcx, 88(%rdi)\n"
+ "movq %rdx, 96(%rdi)\n"
+ "movq %rsi, 104(%rdi)\n"
+ "movq %rdi, 112(%rdi)\n"
+ "movq $0, 120(%rdi)\n" /* orig_rax */
+ "movq $0, 128(%rdi)\n" /* rip */
+ "movq $0, 136(%rdi)\n" /* cs */
+ "pushf\n"
+ "pop %rax\n"
+ "movq %rax, 144(%rdi)\n" /* eflags */
+ "movq %rsp, 152(%rdi)\n" /* rsp */
+ "movq $0, 160(%rdi)\n" /* ss */
+
+ /* save 2nd argument */
+ "pushq %rsi\n"
+ "call uretprobe_regs_trigger\n"
+
+ /* save return value and load 2nd argument pointer to rax */
+ "pushq %rax\n"
+ "movq 8(%rsp), %rax\n"
+
+ "movq %r15, 0(%rax)\n"
+ "movq %r14, 8(%rax)\n"
+ "movq %r13, 16(%rax)\n"
+ "movq %r12, 24(%rax)\n"
+ "movq %rbp, 32(%rax)\n"
+ "movq %rbx, 40(%rax)\n"
+ "movq %r11, 48(%rax)\n"
+ "movq %r10, 56(%rax)\n"
+ "movq %r9, 64(%rax)\n"
+ "movq %r8, 72(%rax)\n"
+ "movq %rcx, 88(%rax)\n"
+ "movq %rdx, 96(%rax)\n"
+ "movq %rsi, 104(%rax)\n"
+ "movq %rdi, 112(%rax)\n"
+ "movq $0, 120(%rax)\n" /* orig_rax */
+ "movq $0, 128(%rax)\n" /* rip */
+ "movq $0, 136(%rax)\n" /* cs */
+
+ /* restore return value and 2nd argument */
+ "pop %rax\n"
+ "pop %rsi\n"
+
+ "movq %rax, 80(%rsi)\n"
+
+ "pushf\n"
+ "pop %rax\n"
+
+ "movq %rax, 144(%rsi)\n" /* eflags */
+ "movq %rsp, 152(%rsi)\n" /* rsp */
+ "movq $0, 160(%rsi)\n" /* ss */
+ "ret\n"
+);
+}
+
+static void test_uretprobe_regs_equal(void)
+{
+ struct uprobe_syscall *skel = NULL;
+ struct pt_regs before = {}, after = {};
+ unsigned long *pb = (unsigned long *) &before;
+ unsigned long *pa = (unsigned long *) &after;
+ unsigned long *pp;
+ unsigned int i, cnt;
+ int err;
+
+ skel = uprobe_syscall__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load"))
+ goto cleanup;
+
+ err = uprobe_syscall__attach(skel);
+ if (!ASSERT_OK(err, "uprobe_syscall__attach"))
+ goto cleanup;
+
+ uretprobe_regs(&before, &after);
+
+ pp = (unsigned long *) &skel->bss->regs;
+ cnt = sizeof(before)/sizeof(*pb);
+
+ for (i = 0; i < cnt; i++) {
+ unsigned int offset = i * sizeof(unsigned long);
+
+ /*
+ * Check register before and after uretprobe_regs_trigger call
+ * that triggers the uretprobe.
+ */
+ switch (offset) {
+ case offsetof(struct pt_regs, rax):
+ ASSERT_EQ(pa[i], 0xdeadbeef, "return value");
+ break;
+ default:
+ if (!ASSERT_EQ(pb[i], pa[i], "register before-after value check"))
+ fprintf(stdout, "failed register offset %u\n", offset);
+ }
+
+ /*
+ * Check register seen from bpf program and register after
+ * uretprobe_regs_trigger call
+ */
+ switch (offset) {
+ /*
+ * These values will be different (not set in uretprobe_regs),
+ * we don't care.
+ */
+ case offsetof(struct pt_regs, orig_rax):
+ case offsetof(struct pt_regs, rip):
+ case offsetof(struct pt_regs, cs):
+ case offsetof(struct pt_regs, rsp):
+ case offsetof(struct pt_regs, ss):
+ break;
+ default:
+ if (!ASSERT_EQ(pp[i], pa[i], "register prog-after value check"))
+ fprintf(stdout, "failed register offset %u\n", offset);
+ }
+ }
+
+cleanup:
+ uprobe_syscall__destroy(skel);
+}
+#else
+static void test_uretprobe_regs_equal(void)
+{
+ test__skip();
+}
+#endif
+
+void test_uprobe_syscall(void)
+{
+ if (test__start_subtest("uretprobe_regs_equal"))
+ test_uretprobe_regs_equal();
+}
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
new file mode 100644
index 000000000000..8a4fa6c7ef59
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <string.h>
+
+struct pt_regs regs;
+
+char _license[] SEC("license") = "GPL";
+
+SEC("uretprobe//proc/self/exe:uretprobe_regs_trigger")
+int uretprobe(struct pt_regs *ctx)
+{
+ __builtin_memcpy(®s, ctx, sizeof(regs));
+ return 0;
+}
--
2.45.0
Adding test that creates uprobe consumer on uretprobe which changes some
of the registers. Making sure the changed registers are propagated to the
user space when the ureptobe syscall trampoline is used on x86_64.
To be able to do this, adding support to bpf_testmod to create uprobe via
new attribute file:
/sys/kernel/bpf_testmod_uprobe
This file is expecting file offset and creates related uprobe on current
process exe file and removes existing uprobe if offset is 0. The can be
only single uprobe at any time.
The uprobe has specific consumer that changes registers used in ureprobe
syscall trampoline and which are later checked in the test.
Acked-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 123 +++++++++++++++++-
.../selftests/bpf/prog_tests/uprobe_syscall.c | 67 ++++++++++
2 files changed, 189 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 2a18bd320e92..b0132a342bb5 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -18,6 +18,7 @@
#include <linux/in6.h>
#include <linux/un.h>
#include <net/sock.h>
+#include <linux/namei.h>
#include "bpf_testmod.h"
#include "bpf_testmod_kfunc.h"
@@ -358,6 +359,119 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
.write = bpf_testmod_test_write,
};
+/* bpf_testmod_uprobe sysfs attribute is so far enabled for x86_64 only,
+ * please see test_uretprobe_regs_change test
+ */
+#ifdef __x86_64__
+
+static int
+uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
+ struct pt_regs *regs)
+
+{
+ regs->ax = 0x12345678deadbeef;
+ regs->cx = 0x87654321feebdaed;
+ regs->r11 = (u64) -1;
+ return true;
+}
+
+struct testmod_uprobe {
+ struct path path;
+ loff_t offset;
+ struct uprobe_consumer consumer;
+};
+
+static DEFINE_MUTEX(testmod_uprobe_mutex);
+
+static struct testmod_uprobe uprobe = {
+ .consumer.ret_handler = uprobe_ret_handler,
+};
+
+static int testmod_register_uprobe(loff_t offset)
+{
+ int err = -EBUSY;
+
+ if (uprobe.offset)
+ return -EBUSY;
+
+ mutex_lock(&testmod_uprobe_mutex);
+
+ if (uprobe.offset)
+ goto out;
+
+ err = kern_path("/proc/self/exe", LOOKUP_FOLLOW, &uprobe.path);
+ if (err)
+ goto out;
+
+ err = uprobe_register_refctr(d_real_inode(uprobe.path.dentry),
+ offset, 0, &uprobe.consumer);
+ if (err)
+ path_put(&uprobe.path);
+ else
+ uprobe.offset = offset;
+
+out:
+ mutex_unlock(&testmod_uprobe_mutex);
+ return err;
+}
+
+static void testmod_unregister_uprobe(void)
+{
+ mutex_lock(&testmod_uprobe_mutex);
+
+ if (uprobe.offset) {
+ uprobe_unregister(d_real_inode(uprobe.path.dentry),
+ uprobe.offset, &uprobe.consumer);
+ uprobe.offset = 0;
+ }
+
+ mutex_unlock(&testmod_uprobe_mutex);
+}
+
+static ssize_t
+bpf_testmod_uprobe_write(struct file *file, struct kobject *kobj,
+ struct bin_attribute *bin_attr,
+ char *buf, loff_t off, size_t len)
+{
+ unsigned long offset = 0;
+ int err = 0;
+
+ if (kstrtoul(buf, 0, &offset))
+ return -EINVAL;
+
+ if (offset)
+ err = testmod_register_uprobe(offset);
+ else
+ testmod_unregister_uprobe();
+
+ return err ?: strlen(buf);
+}
+
+static struct bin_attribute bin_attr_bpf_testmod_uprobe_file __ro_after_init = {
+ .attr = { .name = "bpf_testmod_uprobe", .mode = 0666, },
+ .write = bpf_testmod_uprobe_write,
+};
+
+static int register_bpf_testmod_uprobe(void)
+{
+ return sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_uprobe_file);
+}
+
+static void unregister_bpf_testmod_uprobe(void)
+{
+ testmod_unregister_uprobe();
+ sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_uprobe_file);
+}
+
+#else
+static int register_bpf_testmod_uprobe(void)
+{
+ return 0;
+}
+
+static void unregister_bpf_testmod_uprobe(void) { }
+#endif
+
BTF_KFUNCS_START(bpf_testmod_common_kfunc_ids)
BTF_ID_FLAGS(func, bpf_iter_testmod_seq_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL)
@@ -912,7 +1026,13 @@ static int bpf_testmod_init(void)
return -EINVAL;
sock = NULL;
mutex_init(&sock_lock);
- return sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
+ ret = sysfs_create_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
+ if (ret < 0)
+ return ret;
+ ret = register_bpf_testmod_uprobe();
+ if (ret < 0)
+ return ret;
+ return 0;
}
static void bpf_testmod_exit(void)
@@ -927,6 +1047,7 @@ static void bpf_testmod_exit(void)
bpf_kfunc_close_sock();
sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file);
+ unregister_bpf_testmod_uprobe();
}
module_init(bpf_testmod_init);
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 311ac19d8992..1a50cd35205d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -149,15 +149,82 @@ static void test_uretprobe_regs_equal(void)
cleanup:
uprobe_syscall__destroy(skel);
}
+
+#define BPF_TESTMOD_UPROBE_TEST_FILE "/sys/kernel/bpf_testmod_uprobe"
+
+static int write_bpf_testmod_uprobe(unsigned long offset)
+{
+ size_t n, ret;
+ char buf[30];
+ int fd;
+
+ n = sprintf(buf, "%lu", offset);
+
+ fd = open(BPF_TESTMOD_UPROBE_TEST_FILE, O_WRONLY);
+ if (fd < 0)
+ return -errno;
+
+ ret = write(fd, buf, n);
+ close(fd);
+ return ret != n ? (int) ret : 0;
+}
+
+static void test_uretprobe_regs_change(void)
+{
+ struct pt_regs before = {}, after = {};
+ unsigned long *pb = (unsigned long *) &before;
+ unsigned long *pa = (unsigned long *) &after;
+ unsigned long cnt = sizeof(before)/sizeof(*pb);
+ unsigned int i, err, offset;
+
+ offset = get_uprobe_offset(uretprobe_regs_trigger);
+
+ err = write_bpf_testmod_uprobe(offset);
+ if (!ASSERT_OK(err, "register_uprobe"))
+ return;
+
+ uretprobe_regs(&before, &after);
+
+ err = write_bpf_testmod_uprobe(0);
+ if (!ASSERT_OK(err, "unregister_uprobe"))
+ return;
+
+ for (i = 0; i < cnt; i++) {
+ unsigned int offset = i * sizeof(unsigned long);
+
+ switch (offset) {
+ case offsetof(struct pt_regs, rax):
+ ASSERT_EQ(pa[i], 0x12345678deadbeef, "rax");
+ break;
+ case offsetof(struct pt_regs, rcx):
+ ASSERT_EQ(pa[i], 0x87654321feebdaed, "rcx");
+ break;
+ case offsetof(struct pt_regs, r11):
+ ASSERT_EQ(pa[i], (__u64) -1, "r11");
+ break;
+ default:
+ if (!ASSERT_EQ(pa[i], pb[i], "register before-after value check"))
+ fprintf(stdout, "failed register offset %u\n", offset);
+ }
+ }
+}
+
#else
static void test_uretprobe_regs_equal(void)
{
test__skip();
}
+
+static void test_uretprobe_regs_change(void)
+{
+ test__skip();
+}
#endif
void test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
test_uretprobe_regs_equal();
+ if (test__start_subtest("uretprobe_regs_change"))
+ test_uretprobe_regs_change();
}
--
2.45.0
Adding test to verify that when called from outside of the
trampoline provided by kernel, the uretprobe syscall will cause
calling process to receive SIGILL signal and the attached bpf
program is not executed.
Acked-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 95 +++++++++++++++++++
.../bpf/progs/uprobe_syscall_executed.c | 17 ++++
2 files changed, 112 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 1a50cd35205d..3ef324c2db50 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -7,7 +7,10 @@
#include <unistd.h>
#include <asm/ptrace.h>
#include <linux/compiler.h>
+#include <linux/stringify.h>
+#include <sys/wait.h>
#include "uprobe_syscall.skel.h"
+#include "uprobe_syscall_executed.skel.h"
__naked unsigned long uretprobe_regs_trigger(void)
{
@@ -209,6 +212,91 @@ static void test_uretprobe_regs_change(void)
}
}
+#ifndef __NR_uretprobe
+#define __NR_uretprobe 462
+#endif
+
+__naked unsigned long uretprobe_syscall_call_1(void)
+{
+ /*
+ * Pretend we are uretprobe trampoline to trigger the return
+ * probe invocation in order to verify we get SIGILL.
+ */
+ asm volatile (
+ "pushq %rax\n"
+ "pushq %rcx\n"
+ "pushq %r11\n"
+ "movq $" __stringify(__NR_uretprobe) ", %rax\n"
+ "syscall\n"
+ "popq %r11\n"
+ "popq %rcx\n"
+ "retq\n"
+ );
+}
+
+__naked unsigned long uretprobe_syscall_call(void)
+{
+ asm volatile (
+ "call uretprobe_syscall_call_1\n"
+ "retq\n"
+ );
+}
+
+static void test_uretprobe_syscall_call(void)
+{
+ LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+ .retprobe = true,
+ );
+ struct uprobe_syscall_executed *skel;
+ int pid, status, err, go[2], c;
+
+ if (ASSERT_OK(pipe(go), "pipe"))
+ return;
+
+ skel = uprobe_syscall_executed__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+ goto cleanup;
+
+ pid = fork();
+ if (!ASSERT_GE(pid, 0, "fork"))
+ goto cleanup;
+
+ /* child */
+ if (pid == 0) {
+ close(go[1]);
+
+ /* wait for parent's kick */
+ err = read(go[0], &c, 1);
+ if (err != 1)
+ exit(-1);
+
+ uretprobe_syscall_call();
+ _exit(0);
+ }
+
+ skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
+ "/proc/self/exe",
+ "uretprobe_syscall_call", &opts);
+ if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
+ goto cleanup;
+
+ /* kick the child */
+ write(go[1], &c, 1);
+ err = waitpid(pid, &status, 0);
+ ASSERT_EQ(err, pid, "waitpid");
+
+ /* verify the child got killed with SIGILL */
+ ASSERT_EQ(WIFSIGNALED(status), 1, "WIFSIGNALED");
+ ASSERT_EQ(WTERMSIG(status), SIGILL, "WTERMSIG");
+
+ /* verify the uretprobe program wasn't called */
+ ASSERT_EQ(skel->bss->executed, 0, "executed");
+
+cleanup:
+ uprobe_syscall_executed__destroy(skel);
+ close(go[1]);
+ close(go[0]);
+}
#else
static void test_uretprobe_regs_equal(void)
{
@@ -219,6 +307,11 @@ static void test_uretprobe_regs_change(void)
{
test__skip();
}
+
+static void test_uretprobe_syscall_call(void)
+{
+ test__skip();
+}
#endif
void test_uprobe_syscall(void)
@@ -227,4 +320,6 @@ void test_uprobe_syscall(void)
test_uretprobe_regs_equal();
if (test__start_subtest("uretprobe_regs_change"))
test_uretprobe_regs_change();
+ if (test__start_subtest("uretprobe_syscall_call"))
+ test_uretprobe_syscall_call();
}
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
new file mode 100644
index 000000000000..0d7f1a7db2e2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <string.h>
+
+struct pt_regs regs;
+
+char _license[] SEC("license") = "GPL";
+
+int executed = 0;
+
+SEC("uretprobe.multi")
+int test(struct pt_regs *regs)
+{
+ executed = 1;
+ return 0;
+}
--
2.45.0
Adding man page for new uretprobe syscall.
Signed-off-by: Jiri Olsa <[email protected]>
---
man2/uretprobe.2 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
create mode 100644 man2/uretprobe.2
diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
new file mode 100644
index 000000000000..690fe3b1a44f
--- /dev/null
+++ b/man2/uretprobe.2
@@ -0,0 +1,50 @@
+.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+uretprobe \- execute pending return uprobes
+.SH SYNOPSIS
+.nf
+.B int uretprobe(void)
+.fi
+.SH DESCRIPTION
+The
+.BR uretprobe ()
+syscall is an alternative to breakpoint instructions for
+triggering return uprobe consumers.
+.P
+Calls to
+.BR uretprobe ()
+suscall are only made from the user-space trampoline provided by the kernel.
+Calls from any other place result in a
+.BR SIGILL .
+
+.SH RETURN VALUE
+The
+.BR uretprobe ()
+syscall return value is architecture-specific.
+
+.SH VERSIONS
+This syscall is not specified in POSIX,
+and details of its behavior vary across systems.
+.SH STANDARDS
+None.
+.SH HISTORY
+TBD
+.SH NOTES
+The
+.BR uretprobe ()
+syscall was initially introduced for the x86_64 architecture where it was shown
+to be faster than breakpoint traps. It might be extended to other architectures.
+.P
+The
+.BR uretprobe ()
+syscall exists only to allow the invocation of return uprobe consumers.
+It should
+.B never
+be called directly.
+Details of the arguments (if any) passed to
+.BR uretprobe ()
+and the return value are architecture-specific.
--
2.44.0
Adding uretprobe shadow stack test that runs all existing
uretprobe tests with shadow stack enabled if it's available.
Signed-off-by: Jiri Olsa <[email protected]>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 60 +++++++++++++++++++
1 file changed, 60 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 3ef324c2db50..fda456401284 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -9,6 +9,9 @@
#include <linux/compiler.h>
#include <linux/stringify.h>
#include <sys/wait.h>
+#include <sys/syscall.h>
+#include <sys/prctl.h>
+#include <asm/prctl.h>
#include "uprobe_syscall.skel.h"
#include "uprobe_syscall_executed.skel.h"
@@ -297,6 +300,56 @@ static void test_uretprobe_syscall_call(void)
close(go[1]);
close(go[0]);
}
+
+/*
+ * Borrowed from tools/testing/selftests/x86/test_shadow_stack.c.
+ *
+ * For use in inline enablement of shadow stack.
+ *
+ * The program can't return from the point where shadow stack gets enabled
+ * because there will be no address on the shadow stack. So it can't use
+ * syscall() for enablement, since it is a function.
+ *
+ * Based on code from nolibc.h. Keep a copy here because this can't pull
+ * in all of nolibc.h.
+ */
+#define ARCH_PRCTL(arg1, arg2) \
+({ \
+ long _ret; \
+ register long _num asm("eax") = __NR_arch_prctl; \
+ register long _arg1 asm("rdi") = (long)(arg1); \
+ register long _arg2 asm("rsi") = (long)(arg2); \
+ \
+ asm volatile ( \
+ "syscall\n" \
+ : "=a"(_ret) \
+ : "r"(_arg1), "r"(_arg2), \
+ "0"(_num) \
+ : "rcx", "r11", "memory", "cc" \
+ ); \
+ _ret; \
+})
+
+#ifndef ARCH_SHSTK_ENABLE
+#define ARCH_SHSTK_ENABLE 0x5001
+#define ARCH_SHSTK_DISABLE 0x5002
+#define ARCH_SHSTK_SHSTK (1ULL << 0)
+#endif
+
+static void test_uretprobe_shadow_stack(void)
+{
+ if (ARCH_PRCTL(ARCH_SHSTK_ENABLE, ARCH_SHSTK_SHSTK)) {
+ test__skip();
+ return;
+ }
+
+ /* Run all of the uretprobe tests. */
+ test_uretprobe_regs_equal();
+ test_uretprobe_regs_change();
+ test_uretprobe_syscall_call();
+
+ ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
+}
#else
static void test_uretprobe_regs_equal(void)
{
@@ -312,6 +365,11 @@ static void test_uretprobe_syscall_call(void)
{
test__skip();
}
+
+static void test_uretprobe_shadow_stack(void)
+{
+ test__skip();
+}
#endif
void test_uprobe_syscall(void)
@@ -322,4 +380,6 @@ void test_uprobe_syscall(void)
test_uretprobe_regs_change();
if (test__start_subtest("uretprobe_syscall_call"))
test_uretprobe_syscall_call();
+ if (test__start_subtest("uretprobe_shadow_stack"))
+ test_uretprobe_shadow_stack();
}
--
2.45.0
Hi Jiri,
On Tue, May 21, 2024 at 12:48:25PM GMT, Jiri Olsa wrote:
> Adding man page for new uretprobe syscall.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> man2/uretprobe.2 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
> create mode 100644 man2/uretprobe.2
>
> diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> new file mode 100644
> index 000000000000..690fe3b1a44f
> --- /dev/null
> +++ b/man2/uretprobe.2
> @@ -0,0 +1,50 @@
> +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
What header file provides this system call?
> +.SH DESCRIPTION
> +The
> +.BR uretprobe ()
> +syscall is an alternative to breakpoint instructions for
> +triggering return uprobe consumers.
> +.P
> +Calls to
> +.BR uretprobe ()
> +suscall are only made from the user-space trampoline provided by the kernel.
s/suscall/system call/
> +Calls from any other place result in a
> +.BR SIGILL .
Maybe add an ERRORS section?
> +
We don't use blank lines; it causes a groff(1) warning, and other
problems. Instead, use '.P'.
> +.SH RETURN VALUE
> +The
> +.BR uretprobe ()
> +syscall return value is architecture-specific.
> +
.P
> +.SH VERSIONS
> +This syscall is not specified in POSIX,
Redundant with "STANDARDS: None.".
> +and details of its behavior vary across systems.
Keep this.
> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.SH NOTES
> +The
> +.BR uretprobe ()
> +syscall was initially introduced for the x86_64 architecture where it was shown
> +to be faster than breakpoint traps. It might be extended to other architectures.
Please use semantic newlines.
$ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
Use semantic newlines
In the source of a manual page, new sentences should be started on
new lines, long sentences should be split into lines at clause
breaks (commas, semicolons, colons, and so on), and long clauses
should be split at phrase boundaries. This convention, sometimes
known as "semantic newlines", makes it easier to see the effect of
patches, which often operate at the level of individual sentences,
clauses, or phrases.
> +.P
> +The
> +.BR uretprobe ()
> +syscall exists only to allow the invocation of return uprobe consumers.
s/syscall/system call/
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are architecture-specific.
> --
> 2.44.0
Have a lovely day!
Alex
--
<https://www.alejandro-colomar.es/>
On Tue, May 21, 2024 at 01:36:25PM +0200, Alejandro Colomar wrote:
> Hi Jiri,
>
> On Tue, May 21, 2024 at 12:48:25PM GMT, Jiri Olsa wrote:
> > Adding man page for new uretprobe syscall.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > man2/uretprobe.2 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 50 insertions(+)
> > create mode 100644 man2/uretprobe.2
> >
> > diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> > new file mode 100644
> > index 000000000000..690fe3b1a44f
> > --- /dev/null
> > +++ b/man2/uretprobe.2
> > @@ -0,0 +1,50 @@
> > +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +uretprobe \- execute pending return uprobes
> > +.SH SYNOPSIS
> > +.nf
> > +.B int uretprobe(void)
> > +.fi
>
> What header file provides this system call?
there's no header, it's used/called only by user space trampoline
provided by kernel, it's not expected to be called by user
>
> > +.SH DESCRIPTION
> > +The
> > +.BR uretprobe ()
> > +syscall is an alternative to breakpoint instructions for
> > +triggering return uprobe consumers.
> > +.P
> > +Calls to
> > +.BR uretprobe ()
> > +suscall are only made from the user-space trampoline provided by the kernel.
>
> s/suscall/system call/
ugh leftover sry
>
> > +Calls from any other place result in a
> > +.BR SIGILL .
>
> Maybe add an ERRORS section?
>
> > +
>
> We don't use blank lines; it causes a groff(1) warning, and other
> problems. Instead, use '.P'.
>
> > +.SH RETURN VALUE
> > +The
> > +.BR uretprobe ()
> > +syscall return value is architecture-specific.
> > +
>
> .P
>
> > +.SH VERSIONS
> > +This syscall is not specified in POSIX,
>
> Redundant with "STANDARDS: None.".
>
> > +and details of its behavior vary across systems.
>
> Keep this.
ok
>
> > +.SH STANDARDS
> > +None.
> > +.SH HISTORY
> > +TBD
> > +.SH NOTES
> > +The
> > +.BR uretprobe ()
> > +syscall was initially introduced for the x86_64 architecture where it was shown
> > +to be faster than breakpoint traps. It might be extended to other architectures.
>
> Please use semantic newlines.
>
> $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
> Use semantic newlines
> In the source of a manual page, new sentences should be started on
> new lines, long sentences should be split into lines at clause
> breaks (commas, semicolons, colons, and so on), and long clauses
> should be split at phrase boundaries. This convention, sometimes
> known as "semantic newlines", makes it easier to see the effect of
> patches, which often operate at the level of individual sentences,
> clauses, or phrases.
ok
thanks,
jirka
>
> > +.P
> > +The
> > +.BR uretprobe ()
> > +syscall exists only to allow the invocation of return uprobe consumers.
>
> s/syscall/system call/
>
> > +It should
> > +.B never
> > +be called directly.
> > +Details of the arguments (if any) passed to
> > +.BR uretprobe ()
> > +and the return value are architecture-specific.
> > --
> > 2.44.0
>
> Have a lovely day!
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
On 05/21, Jiri Olsa wrote:
>
> Currently the application with enabled shadow stack will crash
> if it sets up return uprobe. The reason is the uretprobe kernel
> code changes the user space task's stack, but does not update
> shadow stack accordingly.
>
> Adding new functions to update values on shadow stack and using
> them in uprobe code to keep shadow stack in sync with uretprobe
> changes to user stack.
I don't think my ack has any value in this area but looks good to me.
Reviewed-by: Oleg Nesterov <[email protected]>
> Fixes: 8b1c23543436 ("x86/shstk: Add return uprobe support")
Hmm... Was this commit ever applied?
Oleg.
On Tue, May 21, 2024 at 04:22:21PM +0200, Oleg Nesterov wrote:
> On 05/21, Jiri Olsa wrote:
> >
> > Currently the application with enabled shadow stack will crash
> > if it sets up return uprobe. The reason is the uretprobe kernel
> > code changes the user space task's stack, but does not update
> > shadow stack accordingly.
> >
> > Adding new functions to update values on shadow stack and using
> > them in uprobe code to keep shadow stack in sync with uretprobe
> > changes to user stack.
>
> I don't think my ack has any value in this area but looks good to me.
>
> Reviewed-by: Oleg Nesterov <[email protected]>
>
>
> > Fixes: 8b1c23543436 ("x86/shstk: Add return uprobe support")
>
> Hmm... Was this commit ever applied?
should have been:
488af8ea7131 x86/shstk: Wire in shadow stack interface
will send new version
thanks,
jirka
>
> Oleg.
>
On Tue, May 21, 2024 at 01:48:59PM +0200, Jiri Olsa wrote:
> On Tue, May 21, 2024 at 01:36:25PM +0200, Alejandro Colomar wrote:
> > Hi Jiri,
> >
> > On Tue, May 21, 2024 at 12:48:25PM GMT, Jiri Olsa wrote:
> > > Adding man page for new uretprobe syscall.
> > >
> > > Signed-off-by: Jiri Olsa <[email protected]>
> > > ---
> > > man2/uretprobe.2 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 50 insertions(+)
> > > create mode 100644 man2/uretprobe.2
> > >
> > > diff --git a/man2/uretprobe.2 b/man2/uretprobe.2
> > > new file mode 100644
> > > index 000000000000..690fe3b1a44f
> > > --- /dev/null
> > > +++ b/man2/uretprobe.2
> > > @@ -0,0 +1,50 @@
> > > +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> > > +.\"
> > > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > > +.\"
> > > +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> > > +.SH NAME
> > > +uretprobe \- execute pending return uprobes
> > > +.SH SYNOPSIS
> > > +.nf
> > > +.B int uretprobe(void)
> > > +.fi
> >
> > What header file provides this system call?
>
> there's no header, it's used/called only by user space trampoline
> provided by kernel, it's not expected to be called by user
>
> >
> > > +.SH DESCRIPTION
> > > +The
> > > +.BR uretprobe ()
> > > +syscall is an alternative to breakpoint instructions for
> > > +triggering return uprobe consumers.
> > > +.P
> > > +Calls to
> > > +.BR uretprobe ()
> > > +suscall are only made from the user-space trampoline provided by the kernel.
> >
> > s/suscall/system call/
>
> ugh leftover sry
>
> >
> > > +Calls from any other place result in a
> > > +.BR SIGILL .
> >
> > Maybe add an ERRORS section?
> >
> > > +
> >
> > We don't use blank lines; it causes a groff(1) warning, and other
> > problems. Instead, use '.P'.
> >
> > > +.SH RETURN VALUE
> > > +The
> > > +.BR uretprobe ()
> > > +syscall return value is architecture-specific.
> > > +
> >
> > .P
> >
> > > +.SH VERSIONS
> > > +This syscall is not specified in POSIX,
> >
> > Redundant with "STANDARDS: None.".
> >
> > > +and details of its behavior vary across systems.
> >
> > Keep this.
>
> ok
>
> >
> > > +.SH STANDARDS
> > > +None.
> > > +.SH HISTORY
> > > +TBD
> > > +.SH NOTES
> > > +The
> > > +.BR uretprobe ()
> > > +syscall was initially introduced for the x86_64 architecture where it was shown
> > > +to be faster than breakpoint traps. It might be extended to other architectures.
> >
> > Please use semantic newlines.
> >
> > $ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
> > Use semantic newlines
> > In the source of a manual page, new sentences should be started on
> > new lines, long sentences should be split into lines at clause
> > breaks (commas, semicolons, colons, and so on), and long clauses
> > should be split at phrase boundaries. This convention, sometimes
> > known as "semantic newlines", makes it easier to see the effect of
> > patches, which often operate at the level of individual sentences,
> > clauses, or phrases.
>
how about the change below?
thanks,
jirka
---
diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
new file mode 100644
index 000000000000..959b7a47102b
--- /dev/null
+++ b/man/man2/uretprobe.2
@@ -0,0 +1,55 @@
+.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+uretprobe \- execute pending return uprobes
+.SH SYNOPSIS
+.nf
+.B int uretprobe(void)
+.fi
+.SH DESCRIPTION
+The
+.BR uretprobe ()
+system call is an alternative to breakpoint instructions for triggering return
+uprobe consumers.
+.P
+Calls to
+.BR uretprobe ()
+system call are only made from the user-space trampoline provided by the kernel.
+Calls from any other place result in a
+.BR SIGILL .
+.SH RETURN VALUE
+The
+.BR uretprobe ()
+system call return value is architecture-specific.
+.SH ERRORS
+.BR SIGILL
+The
+.BR uretprobe ()
+system call was called by user.
+.SH VERSIONS
+Details of the
+.BR uretprobe ()
+system call behavior vary across systems.
+.SH STANDARDS
+None.
+.SH HISTORY
+TBD
+.SH NOTES
+The
+.BR uretprobe ()
+system call was initially introduced for the x86_64 architecture where it was shown
+to be faster than breakpoint traps.
+It might be extended to other architectures.
+.P
+The
+.BR uretprobe ()
+system call exists only to allow the invocation of return uprobe consumers.
+It should
+.B never
+be called directly.
+Details of the arguments (if any) passed to
+.BR uretprobe ()
+and the return value are architecture-specific.
On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote:
>hi,
>as part of the effort on speeding up the uprobes [0] coming with
>return uprobe optimization by using syscall instead of the trap
>on the uretprobe trampoline.
I understand this provides an optimization on x86. I believe primary reason
is syscall is straight-line microcode and short sequence while trap delivery
still does all the GDT / IDT and segmentation checks and it makes delivery
of the trap slow.
So doing syscall improves that. Although it seems x86 is going to get rid of
that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2].
So I am imagining x86 hardware already exists with FRED support.
On other architectures, I believe trap delivery for breakpoint instruction
is same as syscall instruction.
Given that x86 trap delivery is pretty much going following the suit here and
intend to make trap delivery cost similar to syscall delivery.
Sorry for being buzzkill here but ...
Is it worth introducing this syscall which otherwise has no use on other arches
and x86 (and x86 kernel) has already taken steps to match trap delivery latency with
syscall latency would have similar cost?
Did you do any study of this on FRED enabled x86 CPUs?
[1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html
[2] - https://docs.kernel.org/arch/x86/x86_64/fred.html
>
>The speed up depends on instruction type that uprobe is installed
>and depends on specific HW type, please check patch 1 for details.
>
Hi Jirka,
On Tue, May 21, 2024 at 10:24:30PM GMT, Jiri Olsa wrote:
> how about the change below?
Much better. I still have a few comments below. :-)
>
> thanks,
> jirka
>
>
> ---
> diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> new file mode 100644
> index 000000000000..959b7a47102b
> --- /dev/null
> +++ b/man/man2/uretprobe.2
> @@ -0,0 +1,55 @@
> +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR uretprobe ()
> +system call is an alternative to breakpoint instructions for triggering return
> +uprobe consumers.
> +.P
> +Calls to
> +.BR uretprobe ()
> +system call are only made from the user-space trampoline provided by the kernel.
> +Calls from any other place result in a
> +.BR SIGILL .
> +.SH RETURN VALUE
> +The
> +.BR uretprobe ()
> +system call return value is architecture-specific.
> +.SH ERRORS
> +.BR SIGILL
This should be a tagged paragraph, preceeded with '.TP'. See any manual
page with an ERRORS section for an example.
Also, BR is Bold alternating with Roman, but this is just bold, so it
should use '.B'.
.TP
.B SIGILL
> +The
> +.BR uretprobe ()
> +system call was called by user.
> +.SH VERSIONS
> +Details of the
> +.BR uretprobe ()
> +system call behavior vary across systems.
> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.SH NOTES
> +The
> +.BR uretprobe ()
> +system call was initially introduced for the x86_64 architecture where it was shown
We have a strong-ish limit at column 80. Please break after
'architecture', which is a clause boundary.
Have a lovely night!
Alex
> +to be faster than breakpoint traps.
> +It might be extended to other architectures.
> +.P
> +The
> +.BR uretprobe ()
> +system call exists only to allow the invocation of return uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are architecture-specific.
>
--
<https://www.alejandro-colomar.es/>
On Tue, May 21, 2024 at 1:49 PM Deepak Gupta <[email protected]> wrote:
>
> On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote:
> >hi,
> >as part of the effort on speeding up the uprobes [0] coming with
> >return uprobe optimization by using syscall instead of the trap
> >on the uretprobe trampoline.
>
> I understand this provides an optimization on x86. I believe primary reason
> is syscall is straight-line microcode and short sequence while trap delivery
> still does all the GDT / IDT and segmentation checks and it makes delivery
> of the trap slow.
>
> So doing syscall improves that. Although it seems x86 is going to get rid of
> that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2].
> So I am imagining x86 hardware already exists with FRED support.
>
> On other architectures, I believe trap delivery for breakpoint instruction
> is same as syscall instruction.
>
> Given that x86 trap delivery is pretty much going following the suit here and
> intend to make trap delivery cost similar to syscall delivery.
>
> Sorry for being buzzkill here but ...
> Is it worth introducing this syscall which otherwise has no use on other arches
> and x86 (and x86 kernel) has already taken steps to match trap delivery latency with
> syscall latency would have similar cost?
>
> Did you do any study of this on FRED enabled x86 CPUs?
afaik CPUs with FRED do not exist on the market and it's
not clear when they will be available.
And when they finally will be on the shelves
the overhead of FRED vs int3 would still have to be measured.
int3 with FRED might still be higher than syscall with FRED.
>
> [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html
> [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html
>
> >
> >The speed up depends on instruction type that uprobe is installed
> >and depends on specific HW type, please check patch 1 for details.
> >
On Tue, May 21, 2024 at 10:54:36PM +0200, Alejandro Colomar wrote:
> Hi Jirka,
>
> On Tue, May 21, 2024 at 10:24:30PM GMT, Jiri Olsa wrote:
> > how about the change below?
>
> Much better. I still have a few comments below. :-)
>
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> > new file mode 100644
> > index 000000000000..959b7a47102b
> > --- /dev/null
> > +++ b/man/man2/uretprobe.2
> > @@ -0,0 +1,55 @@
> > +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +uretprobe \- execute pending return uprobes
> > +.SH SYNOPSIS
> > +.nf
> > +.B int uretprobe(void)
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR uretprobe ()
> > +system call is an alternative to breakpoint instructions for triggering return
> > +uprobe consumers.
> > +.P
> > +Calls to
> > +.BR uretprobe ()
> > +system call are only made from the user-space trampoline provided by the kernel.
> > +Calls from any other place result in a
> > +.BR SIGILL .
> > +.SH RETURN VALUE
> > +The
> > +.BR uretprobe ()
> > +system call return value is architecture-specific.
> > +.SH ERRORS
> > +.BR SIGILL
>
> This should be a tagged paragraph, preceeded with '.TP'. See any manual
> page with an ERRORS section for an example.
>
> Also, BR is Bold alternating with Roman, but this is just bold, so it
> should use '.B'.
>
> .TP
> .B SIGILL
ok
>
> > +The
> > +.BR uretprobe ()
> > +system call was called by user.
> > +.SH VERSIONS
> > +Details of the
> > +.BR uretprobe ()
> > +system call behavior vary across systems.
> > +.SH STANDARDS
> > +None.
> > +.SH HISTORY
> > +TBD
> > +.SH NOTES
> > +The
> > +.BR uretprobe ()
> > +system call was initially introduced for the x86_64 architecture where it was shown
>
> We have a strong-ish limit at column 80. Please break after
> 'architecture', which is a clause boundary.
>
ok, thanks
jirka
---
diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
new file mode 100644
index 000000000000..5b5f340b59b6
--- /dev/null
+++ b/man/man2/uretprobe.2
@@ -0,0 +1,56 @@
+.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+uretprobe \- execute pending return uprobes
+.SH SYNOPSIS
+.nf
+.B int uretprobe(void)
+.fi
+.SH DESCRIPTION
+The
+.BR uretprobe ()
+system call is an alternative to breakpoint instructions for triggering return
+uprobe consumers.
+.P
+Calls to
+.BR uretprobe ()
+system call are only made from the user-space trampoline provided by the kernel.
+Calls from any other place result in a
+.BR SIGILL .
+.SH RETURN VALUE
+The
+.BR uretprobe ()
+system call return value is architecture-specific.
+.SH ERRORS
+.TP
+.B SIGILL
+The
+.BR uretprobe ()
+system call was called by user.
+.SH VERSIONS
+Details of the
+.BR uretprobe ()
+system call behavior vary across systems.
+.SH STANDARDS
+None.
+.SH HISTORY
+TBD
+.SH NOTES
+The
+.BR uretprobe ()
+system call was initially introduced for the x86_64 architecture
+where it was shown to be faster than breakpoint traps.
+It might be extended to other architectures.
+.P
+The
+.BR uretprobe ()
+system call exists only to allow the invocation of return uprobe consumers.
+It should
+.B never
+be called directly.
+Details of the arguments (if any) passed to
+.BR uretprobe ()
+and the return value are architecture-specific.
On Tue, May 21, 2024 at 01:57:33PM -0700, Alexei Starovoitov wrote:
> On Tue, May 21, 2024 at 1:49 PM Deepak Gupta <[email protected]> wrote:
> >
> > On Tue, May 21, 2024 at 12:48:16PM +0200, Jiri Olsa wrote:
> > >hi,
> > >as part of the effort on speeding up the uprobes [0] coming with
> > >return uprobe optimization by using syscall instead of the trap
> > >on the uretprobe trampoline.
> >
> > I understand this provides an optimization on x86. I believe primary reason
> > is syscall is straight-line microcode and short sequence while trap delivery
> > still does all the GDT / IDT and segmentation checks and it makes delivery
> > of the trap slow.
> >
> > So doing syscall improves that. Although it seems x86 is going to get rid of
> > that as part of FRED [1, 2]. And linux kernel support for FRED is already upstream [2].
> > So I am imagining x86 hardware already exists with FRED support.
> >
> > On other architectures, I believe trap delivery for breakpoint instruction
> > is same as syscall instruction.
> >
> > Given that x86 trap delivery is pretty much going following the suit here and
> > intend to make trap delivery cost similar to syscall delivery.
> >
> > Sorry for being buzzkill here but ...
> > Is it worth introducing this syscall which otherwise has no use on other arches
> > and x86 (and x86 kernel) has already taken steps to match trap delivery latency with
> > syscall latency would have similar cost?
> >
> > Did you do any study of this on FRED enabled x86 CPUs?
nope.. interesting, will check, thanks
>
> afaik CPUs with FRED do not exist on the market and it's
> not clear when they will be available.
> And when they finally will be on the shelves
> the overhead of FRED vs int3 would still have to be measured.
> int3 with FRED might still be higher than syscall with FRED.
+1, also it's not really a complicated change and the wiring of the
new syscall to uretprobe is really simple and we could go back to int3
with just one single patch if we see no longer any benefit to it,
but at the moment it provides speed up
jirka
>
> >
> > [1] - https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html
> > [2] - https://docs.kernel.org/arch/x86/x86_64/fred.html
> >
> > >
> > >The speed up depends on instruction type that uprobe is installed
> > >and depends on specific HW type, please check patch 1 for details.
> > >
Hi Jirka,
On Wed, May 22, 2024 at 09:54:58AM GMT, Jiri Olsa wrote:
> ok, thanks
>
> jirka
>
>
> ---
> diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> new file mode 100644
> index 000000000000..5b5f340b59b6
> --- /dev/null
> +++ b/man/man2/uretprobe.2
> @@ -0,0 +1,56 @@
> +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uretprobe \- execute pending return uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uretprobe(void)
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR uretprobe ()
> +system call is an alternative to breakpoint instructions for triggering return
> +uprobe consumers.
> +.P
> +Calls to
> +.BR uretprobe ()
> +system call are only made from the user-space trampoline provided by the kernel.
> +Calls from any other place result in a
> +.BR SIGILL .
> +.SH RETURN VALUE
> +The
> +.BR uretprobe ()
> +system call return value is architecture-specific.
> +.SH ERRORS
> +.TP
> +.B SIGILL
> +The
> +.BR uretprobe ()
> +system call was called by user.
Maybe 'a user-space program'?
Anyway, LGTM. Thanks!
Reviewed-by: Alejandro Colomar <[email protected]>
Have a lovely day!
Alex
> +.SH VERSIONS
> +Details of the
> +.BR uretprobe ()
> +system call behavior vary across systems.
> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.SH NOTES
> +The
> +.BR uretprobe ()
> +system call was initially introduced for the x86_64 architecture
> +where it was shown to be faster than breakpoint traps.
> +It might be extended to other architectures.
> +.P
> +The
> +.BR uretprobe ()
> +system call exists only to allow the invocation of return uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +Details of the arguments (if any) passed to
> +.BR uretprobe ()
> +and the return value are architecture-specific.
--
<https://www.alejandro-colomar.es/>
On Wed, May 22, 2024 at 12:59:46PM +0200, Alejandro Colomar wrote:
> Hi Jirka,
>
> On Wed, May 22, 2024 at 09:54:58AM GMT, Jiri Olsa wrote:
> > ok, thanks
> >
> > jirka
> >
> >
> > ---
> > diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> > new file mode 100644
> > index 000000000000..5b5f340b59b6
> > --- /dev/null
> > +++ b/man/man2/uretprobe.2
> > @@ -0,0 +1,56 @@
> > +.\" Copyright (C) 2024, Jiri Olsa <[email protected]>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +uretprobe \- execute pending return uprobes
> > +.SH SYNOPSIS
> > +.nf
> > +.B int uretprobe(void)
> > +.fi
> > +.SH DESCRIPTION
> > +The
> > +.BR uretprobe ()
> > +system call is an alternative to breakpoint instructions for triggering return
> > +uprobe consumers.
> > +.P
> > +Calls to
> > +.BR uretprobe ()
> > +system call are only made from the user-space trampoline provided by the kernel.
> > +Calls from any other place result in a
> > +.BR SIGILL .
> > +.SH RETURN VALUE
> > +The
> > +.BR uretprobe ()
> > +system call return value is architecture-specific.
> > +.SH ERRORS
> > +.TP
> > +.B SIGILL
> > +The
> > +.BR uretprobe ()
> > +system call was called by user.
>
> Maybe 'a user-space program'?
> Anyway, LGTM. Thanks!
>
> Reviewed-by: Alejandro Colomar <[email protected]>
ok, will change, thanks a lot
jirka
>
> Have a lovely day!
> Alex
>
> > +.SH VERSIONS
> > +Details of the
> > +.BR uretprobe ()
> > +system call behavior vary across systems.
> > +.SH STANDARDS
> > +None.
> > +.SH HISTORY
> > +TBD
> > +.SH NOTES
> > +The
> > +.BR uretprobe ()
> > +system call was initially introduced for the x86_64 architecture
> > +where it was shown to be faster than breakpoint traps.
> > +It might be extended to other architectures.
> > +.P
> > +The
> > +.BR uretprobe ()
> > +system call exists only to allow the invocation of return uprobe consumers.
> > +It should
> > +.B never
> > +be called directly.
> > +Details of the arguments (if any) passed to
> > +.BR uretprobe ()
> > +and the return value are architecture-specific.
>
> --
> <https://www.alejandro-colomar.es/>
On Tue, 2024-05-21 at 12:48 +0200, Jiri Olsa wrote:
> Currently the application with enabled shadow stack will crash
> if it sets up return uprobe. The reason is the uretprobe kernel
> code changes the user space task's stack, but does not update
> shadow stack accordingly.
>
> Adding new functions to update values on shadow stack and using
> them in uprobe code to keep shadow stack in sync with uretprobe
> changes to user stack.
>
> Fixes: 8b1c23543436 ("x86/shstk: Add return uprobe support")
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
Acked-by: Rick Edgecombe <[email protected]>