2020-08-29 13:01:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

Hi,

Here is the 5th version of the series to unify the kretprobe trampoline handler
and to make the kretprobe lockless. Thanks Peter for this work !!

Previous version is here;

https://lkml.kernel.org/r/159861759775.992023.12553306821235086809.stgit@devnote2

This version merges the remove-task-scan patch into remove kretprobe hash
patch, fixes code according to the comments, and fixes a minor issues.

This version is also available on
git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git lockless-kretprobe-v5

I ran smoke test and ftracetest on x86-64 and did build tests for x86-64, i386, arm,
arm64, and mips.


Thank you,

---

Masami Hiramatsu (16):
kprobes: Add generic kretprobe trampoline handler
x86/kprobes: Use generic kretprobe trampoline handler
arm: kprobes: Use generic kretprobe trampoline handler
arm64: kprobes: Use generic kretprobe trampoline handler
arc: kprobes: Use generic kretprobe trampoline handler
csky: kprobes: Use generic kretprobe trampoline handler
ia64: kprobes: Use generic kretprobe trampoline handler
mips: kprobes: Use generic kretprobe trampoline handler
parisc: kprobes: Use generic kretprobe trampoline handler
powerpc: kprobes: Use generic kretprobe trampoline handler
s390: kprobes: Use generic kretprobe trampoline handler
sh: kprobes: Use generic kretprobe trampoline handler
sparc: kprobes: Use generic kretprobe trampoline handler
kprobes: Remove NMI context check
kprobes: Free kretprobe_instance with rcu callback
kprobes: Make local used functions static

Peter Zijlstra (5):
llist: Add nonatomic __llist_add() and __llist_dell_all()
kprobes: Remove kretprobe hash
asm-generic/atomic: Add try_cmpxchg() fallbacks
freelist: Lock less freelist
kprobes: Replace rp->free_instance with freelist


arch/arc/kernel/kprobes.c | 54 ------
arch/arm/probes/kprobes/core.c | 78 ---------
arch/arm64/kernel/probes/kprobes.c | 78 ---------
arch/csky/kernel/probes/kprobes.c | 77 --------
arch/ia64/kernel/kprobes.c | 77 --------
arch/mips/kernel/kprobes.c | 54 ------
arch/parisc/kernel/kprobes.c | 76 --------
arch/powerpc/kernel/kprobes.c | 53 ------
arch/s390/kernel/kprobes.c | 79 ---------
arch/sh/kernel/kprobes.c | 58 ------
arch/sparc/kernel/kprobes.c | 51 ------
arch/x86/include/asm/atomic.h | 2
arch/x86/include/asm/atomic64_64.h | 2
arch/x86/include/asm/cmpxchg.h | 2
arch/x86/kernel/kprobes/core.c | 108 ------------
drivers/gpu/drm/i915/i915_request.c | 6 -
include/asm-generic/atomic-instrumented.h | 216 ++++++++++++++----------
include/linux/atomic-arch-fallback.h | 90 +++++++++-
include/linux/atomic-fallback.h | 90 +++++++++-
include/linux/freelist.h | 129 ++++++++++++++
include/linux/kprobes.h | 74 +++++---
include/linux/llist.h | 23 +++
include/linux/sched.h | 4
kernel/fork.c | 4
kernel/kprobes.c | 264 +++++++++++++----------------
kernel/trace/trace_kprobe.c | 3
scripts/atomic/gen-atomic-fallback.sh | 63 ++++++-
scripts/atomic/gen-atomic-instrumented.sh | 29 +++
28 files changed, 735 insertions(+), 1109 deletions(-)
create mode 100644 include/linux/freelist.h

--
Masami Hiramatsu (Linaro) <[email protected]>


2020-08-29 13:03:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 03/21] arm: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Use regs->ARM_fp
for framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v2:
- Fix to cast frame_pointer type to void *.
---
arch/arm/probes/kprobes/core.c | 78 ++--------------------------------------
1 file changed, 3 insertions(+), 75 deletions(-)

diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index feefa2055eba..a9653117ca0d 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -413,87 +413,15 @@ void __naked __kprobes kretprobe_trampoline(void)
/* Called from kretprobe_trampoline */
static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * a return probe installed on them, and/or more than one return
- * probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
+ (void *)regs->ARM_fp);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->ARM_lr;
+ ri->fp = (void *)regs->ARM_fp;

/* Replace the return addr with trampoline addr. */
regs->ARM_lr = (unsigned long)&kretprobe_trampoline;

2020-08-29 13:04:49

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 01/21] kprobes: Add generic kretprobe trampoline handler

Add a generic kretprobe trampoline handler for unifying
the all cloned /arch/* kretprobe trampoline handlers.

The generic kretprobe trampoline handler is based on the
x86 implementation, because it is the latest implementation.
It has frame pointer checking, kprobe_busy_begin/end and
return address fixup for user handlers.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Restore actual current_kprobe instead of kprobe_busy.
Changes in v4:
- Remove orig_ret_address
- Change the type of trampoline_address to void *
---
include/linux/kprobes.h | 32 +++++++++++++--
kernel/kprobes.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 126 insertions(+), 4 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9be1bff4f586..c6a913e608b7 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -187,10 +187,38 @@ static inline int kprobes_built_in(void)
return 1;
}

+extern struct kprobe kprobe_busy;
+void kprobe_busy_begin(void);
+void kprobe_busy_end(void);
+
#ifdef CONFIG_KRETPROBES
extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);
+
+/* If the trampoline handler called from a kprobe, use this version */
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer);
+
+static nokprobe_inline
+unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer)
+{
+ unsigned long ret;
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never runs in kprobe handler, any kprobe must not
+ * be running at this point.
+ */
+ kprobe_busy_begin();
+ ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);
+ kprobe_busy_end();
+
+ return ret;
+}
+
#else /* CONFIG_KRETPROBES */
static inline void arch_prepare_kretprobe(struct kretprobe *rp,
struct pt_regs *regs)
@@ -354,10 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
return this_cpu_ptr(&kprobe_ctlblk);
}

-extern struct kprobe kprobe_busy;
-void kprobe_busy_begin(void);
-void kprobe_busy_end(void);
-
kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..a0afaa79024e 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1927,6 +1927,104 @@ unsigned long __weak arch_deref_entry_point(void *entry)
}

#ifdef CONFIG_KRETPROBES
+
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer)
+{
+ struct kretprobe_instance *ri = NULL, *last = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *tmp;
+ unsigned long flags;
+ kprobe_opcode_t *correct_ret_addr = NULL;
+ bool skipped = false;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ kretprobe_hash_lock(current, &head, &flags);
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because multiple functions in the call path have
+ * return probes installed on them, and/or more than one
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always pushed into the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the (chronologically) first instance's ret_addr
+ * will be the real return address, and all the rest will
+ * point to kretprobe_trampoline.
+ */
+ hlist_for_each_entry(ri, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+ /*
+ * Return probes must be pushed on this hash list correct
+ * order (same as return order) so that it can be popped
+ * correctly. However, if we find it is pushed it incorrect
+ * order, this means we find a function which should not be
+ * probed, because the wrong order entry is pushed on the
+ * path of processing other kretprobe itself.
+ */
+ if (ri->fp != frame_pointer) {
+ if (!skipped)
+ pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
+ skipped = true;
+ continue;
+ }
+
+ correct_ret_addr = ri->ret_addr;
+ if (skipped)
+ pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
+ ri->rp->kp.addr);
+
+ if (correct_ret_addr != trampoline_address)
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+
+ kretprobe_assert(ri, (unsigned long)correct_ret_addr,
+ (unsigned long)trampoline_address);
+ last = ri;
+
+ hlist_for_each_entry_safe(ri, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+ if (ri->fp != frame_pointer)
+ continue;
+
+ if (ri->rp && ri->rp->handler) {
+ struct kprobe *prev = kprobe_running();
+
+ __this_cpu_write(current_kprobe, &ri->rp->kp);
+ ri->ret_addr = correct_ret_addr;
+ ri->rp->handler(ri, regs);
+ __this_cpu_write(current_kprobe, prev);
+ }
+
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (ri == last)
+ break;
+ }
+
+ kretprobe_hash_unlock(current, &flags);
+
+ hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+
+ return (unsigned long)correct_ret_addr;
+}
+NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
+
/*
* This kprobe pre_handler is registered with every kretprobe. When probe
* hits it will set up the return probe.

2020-08-29 13:04:58

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 02/21] x86/kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 108 +---------------------------------------
1 file changed, 3 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index fdadc37d72af..882b95313ad6 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -767,124 +767,22 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);

+
/*
* Called from kretprobe_trampoline
*/
__used __visible void *trampoline_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
- void *frame_pointer;
- bool skipped = false;
-
- /*
- * Set a dummy kprobe for avoiding kretprobe recursion.
- * Since kretprobe never run in kprobe handler, kprobe must not
- * be running at this point.
- */
- kprobe_busy_begin();
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
/* fixup registers */
regs->cs = __KERNEL_CS;
#ifdef CONFIG_X86_32
regs->cs |= get_kernel_rpl();
regs->gs = 0;
#endif
- /* We use pt_regs->sp for return address holder. */
- frame_pointer = &regs->sp;
- regs->ip = trampoline_address;
+ regs->ip = (unsigned long)&kretprobe_trampoline;
regs->orig_ax = ~0UL;

- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry(ri, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- /*
- * Return probes must be pushed on this hash list correct
- * order (same as return order) so that it can be popped
- * correctly. However, if we find it is pushed it incorrect
- * order, this means we find a function which should not be
- * probed, because the wrong order entry is pushed on the
- * path of processing other kretprobe itself.
- */
- if (ri->fp != frame_pointer) {
- if (!skipped)
- pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
- skipped = true;
- continue;
- }
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (skipped)
- pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
- ri->rp->kp.addr);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- if (ri->fp != frame_pointer)
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, &kprobe_busy);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- kprobe_busy_end();
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp);
}
NOKPROBE_SYMBOL(trampoline_handler);


2020-08-29 13:05:42

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 11/21] s390: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/s390/kernel/kprobes.c | 79 +-------------------------------------------
1 file changed, 2 insertions(+), 77 deletions(-)

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index d2a71d872638..fc30e799bd84 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -228,6 +228,7 @@ NOKPROBE_SYMBOL(pop_kprobe);
void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *) regs->gprs[14];
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->gprs[14] = (unsigned long) &kretprobe_trampoline;
@@ -331,83 +332,7 @@ static void __used kretprobe_trampoline_holder(void)
*/
static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address;
- unsigned long trampoline_address;
- kprobe_opcode_t *correct_ret_addr;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- ri = NULL;
- orig_ret_address = 0;
- correct_ret_addr = NULL;
- trampoline_address = (unsigned long) &kretprobe_trampoline;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long) ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long) ri->ret_addr;
-
- if (ri->rp && ri->rp->handler) {
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- regs->psw.addr = orig_ret_address;
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler

2020-08-29 13:07:00

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 05/21] arc: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/arc/kernel/kprobes.c | 54 ++-------------------------------------------
1 file changed, 2 insertions(+), 52 deletions(-)

diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index 7d3efe83cba7..cabef45f11df 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -388,6 +388,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
{

ri->ret_addr = (kprobe_opcode_t *) regs->blink;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->blink = (unsigned long)&kretprobe_trampoline;
@@ -396,58 +397,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address) {
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- regs->ret = orig_ret_address;
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);

/* By returning a non zero value, we are telling the kprobe handler
* that we don't want the post_handler to run

2020-08-29 13:07:02

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 12/21] sh: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/sh/kernel/kprobes.c | 58 ++--------------------------------------------
1 file changed, 3 insertions(+), 55 deletions(-)

diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 318296f48f1a..756100b01e84 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -204,6 +204,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *) regs->pr;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->pr = (unsigned long)kretprobe_trampoline;
@@ -302,62 +303,9 @@ static void __used kretprobe_trampoline_holder(void)
*/
int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
+ regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);

- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- regs->pc = orig_ret_address;
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
- return orig_ret_address;
+ return 1;
}

static int __kprobes post_kprobe_handler(struct pt_regs *regs)

2020-08-29 13:07:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 04/21] arm64: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler, and use the
kernel_stack_pointer(regs) for framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/arm64/kernel/probes/kprobes.c | 78 +-----------------------------------
1 file changed, 3 insertions(+), 75 deletions(-)

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 5290f17a4d80..deba738142ed 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -464,87 +464,15 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
+ (void *)kernel_stack_pointer(regs));
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->regs[30];
+ ri->fp = (void *)kernel_stack_pointer(regs);

/* replace return addr (x30) with trampoline */
regs->regs[30] = (long)&kretprobe_trampoline;

2020-08-29 13:08:31

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 14/21] kprobes: Remove NMI context check

Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
kretprobe from within kprobe_flush_task") sets a dummy current
kprobe in the trampoline handler by kprobe_busy_begin/end(),
it is not possible to run a kretprobe pre handler in kretprobe
trampoline handler context even with the NMI. If the NMI interrupts
a kretprobe_trampoline_handler() and it hits a kretprobe, the
2nd kretprobe will detect recursion correctly and it will be
skipped.
This means we have almost no double-lock issue on kretprobes by NMI.

The last one point is in cleanup_rp_inst() which also takes
kretprobe_table_lock without setting up current kprobes.
So adding kprobe_busy_begin/end() there allows us to remove
in_nmi() check.

The above commit applies kprobe_busy_begin/end() on x86, but
now all arch implementation are unified to generic one, we can
safely remove the in_nmi() check from arch independent code.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/kprobes.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index a0afaa79024e..211138225fa5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1359,7 +1359,8 @@ static void cleanup_rp_inst(struct kretprobe *rp)
struct hlist_node *next;
struct hlist_head *head;

- /* No race here */
+ /* To avoid recursive kretprobe by NMI, set kprobe busy here */
+ kprobe_busy_begin();
for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
kretprobe_table_lock(hash, &flags);
head = &kretprobe_inst_table[hash];
@@ -1369,6 +1370,8 @@ static void cleanup_rp_inst(struct kretprobe *rp)
}
kretprobe_table_unlock(hash, &flags);
}
+ kprobe_busy_end();
+
free_rp_inst(rp);
}
NOKPROBE_SYMBOL(cleanup_rp_inst);
@@ -2035,17 +2038,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
unsigned long hash, flags = 0;
struct kretprobe_instance *ri;

- /*
- * To avoid deadlocks, prohibit return probing in NMI contexts,
- * just skip the probe and increase the (inexact) 'nmissed'
- * statistical counter, so that the user is informed that
- * something happened:
- */
- if (unlikely(in_nmi())) {
- rp->nmissed++;
- return 0;
- }
-
/* TODO: consider to only swap the RA after the last pre_handler fired */
hash = hash_ptr(current, KPROBE_HASH_BITS);
raw_spin_lock_irqsave(&rp->lock, flags);

2020-08-29 13:08:40

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 13/21] sparc: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/sparc/kernel/kprobes.c | 51 +++----------------------------------------
1 file changed, 3 insertions(+), 48 deletions(-)

diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index dfbca2470536..217c21a6986a 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -453,6 +453,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)(regs->u_regs[UREG_RETPC] + 8);
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->u_regs[UREG_RETPC] =
@@ -465,58 +466,12 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
+ unsigned long orig_ret_address = 0;

- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
regs->tpc = orig_ret_address;
regs->tnpc = orig_ret_address + 4;

- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler

2020-08-29 13:11:27

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 15/21] kprobes: Free kretprobe_instance with rcu callback

Free kretprobe_instance with rcu callback instead of directly
freeing the object in the kretprobe handler context.

This will make kretprobe run safer in NMI context.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v3:
- Stick the rcu_head with hlist_node in kretprobe_instance
- Make recycle_rp_inst() static
---
include/linux/kprobes.h | 6 ++++--
kernel/kprobes.c | 25 ++++++-------------------
2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index c6a913e608b7..663be8debf25 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -156,7 +156,10 @@ struct kretprobe {
};

struct kretprobe_instance {
- struct hlist_node hlist;
+ union {
+ struct hlist_node hlist;
+ struct rcu_head rcu;
+ };
struct kretprobe *rp;
kprobe_opcode_t *ret_addr;
struct task_struct *task;
@@ -395,7 +398,6 @@ int register_kretprobes(struct kretprobe **rps, int num);
void unregister_kretprobes(struct kretprobe **rps, int num);

void kprobe_flush_task(struct task_struct *tk);
-void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);

int disable_kprobe(struct kprobe *kp);
int enable_kprobe(struct kprobe *kp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 211138225fa5..0676868f1ac2 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1223,8 +1223,7 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
}
NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);

-void recycle_rp_inst(struct kretprobe_instance *ri,
- struct hlist_head *head)
+static void recycle_rp_inst(struct kretprobe_instance *ri)
{
struct kretprobe *rp = ri->rp;

@@ -1236,8 +1235,7 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock(&rp->lock);
} else
- /* Unregistering */
- hlist_add_head(&ri->hlist, head);
+ kfree_rcu(ri, rcu);
}
NOKPROBE_SYMBOL(recycle_rp_inst);

@@ -1313,7 +1311,7 @@ void kprobe_busy_end(void)
void kprobe_flush_task(struct task_struct *tk)
{
struct kretprobe_instance *ri;
- struct hlist_head *head, empty_rp;
+ struct hlist_head *head;
struct hlist_node *tmp;
unsigned long hash, flags = 0;

@@ -1323,19 +1321,14 @@ void kprobe_flush_task(struct task_struct *tk)

kprobe_busy_begin();

- INIT_HLIST_HEAD(&empty_rp);
hash = hash_ptr(tk, KPROBE_HASH_BITS);
head = &kretprobe_inst_table[hash];
kretprobe_table_lock(hash, &flags);
hlist_for_each_entry_safe(ri, tmp, head, hlist) {
if (ri->task == tk)
- recycle_rp_inst(ri, &empty_rp);
+ recycle_rp_inst(ri);
}
kretprobe_table_unlock(hash, &flags);
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }

kprobe_busy_end();
}
@@ -1936,13 +1929,12 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *frame_pointer)
{
struct kretprobe_instance *ri = NULL, *last = NULL;
- struct hlist_head *head, empty_rp;
+ struct hlist_head *head;
struct hlist_node *tmp;
unsigned long flags;
kprobe_opcode_t *correct_ret_addr = NULL;
bool skipped = false;

- INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);

/*
@@ -2011,7 +2003,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
__this_cpu_write(current_kprobe, prev);
}

- recycle_rp_inst(ri, &empty_rp);
+ recycle_rp_inst(ri);

if (ri == last)
break;
@@ -2019,11 +2011,6 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,

kretprobe_hash_unlock(current, &flags);

- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
return (unsigned long)correct_ret_addr;
}
NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)

2020-08-29 13:12:50

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 20/21] freelist: Lock less freelist

From: Peter Zijlstra <[email protected]>

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/freelist.h | 129 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 129 insertions(+)
create mode 100644 include/linux/freelist.h

diff --git a/include/linux/freelist.h b/include/linux/freelist.h
new file mode 100644
index 000000000000..fc1842b96469
--- /dev/null
+++ b/include/linux/freelist.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */
+#ifndef FREELIST_H
+#define FREELIST_H
+
+#include <linux/atomic.h>
+
+/*
+ * Copyright: [email protected]
+ *
+ * A simple CAS-based lock-free free list. Not the fastest thing in the world
+ * under heavy contention, but simple and correct (assuming nodes are never
+ * freed until after the free list is destroyed), and fairly speedy under low
+ * contention.
+ *
+ * Adapted from: https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists
+ */
+
+struct freelist_node {
+ atomic_t refs;
+ struct freelist_node *next;
+};
+
+struct freelist_head {
+ struct freelist_node *head;
+};
+
+#define REFS_ON_FREELIST 0x80000000
+#define REFS_MASK 0x7FFFFFFF
+
+static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+ /*
+ * Since the refcount is zero, and nobody can increase it once it's
+ * zero (except us, and we run only one copy of this method per node at
+ * a time, i.e. the single thread case), then we know we can safely
+ * change the next pointer of the node; however, once the refcount is
+ * back above zero, then other threads could increase it (happens under
+ * heavy contention, when the refcount goes to zero in between a load
+ * and a refcount increment of a node in try_get, then back up to
+ * something non-zero, then the refcount increment is done by the other
+ * thread) -- so if the CAS to add the node to the actual list fails,
+ * decrese the refcount and leave the add operation to the next thread
+ * who puts the refcount back to zero (which could be us, hence the
+ * loop).
+ */
+ struct freelist_node *head = READ_ONCE(list->head);
+
+ for (;;) {
+ WRITE_ONCE(node->next, head);
+ atomic_set_release(&node->refs, 1);
+
+ if (!try_cmpxchg_release(&list->head, &head, node)) {
+ /*
+ * Hmm, the add failed, but we can only try again when
+ * the refcount goes back to zero.
+ */
+ if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
+ continue;
+ }
+ return;
+ }
+}
+
+static inline void freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+ /*
+ * We know that the should-be-on-freelist bit is 0 at this point, so
+ * it's safe to set it using a fetch_add.
+ */
+ if (!atomic_fetch_add_release(REFS_ON_FREELIST, &node->refs)) {
+ /*
+ * Oh look! We were the last ones referencing this node, and we
+ * know we want to add it to the free list, so let's do it!
+ */
+ __freelist_add(node, list);
+ }
+}
+
+static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
+{
+ struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
+ unsigned int refs;
+
+ while (head) {
+ prev = head;
+ refs = atomic_read(&head->refs);
+ if ((refs & REFS_MASK) == 0 ||
+ !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
+ head = smp_load_acquire(&list->head);
+ continue;
+ }
+
+ /*
+ * Good, reference count has been incremented (it wasn't at
+ * zero), which means we can read the next and not worry about
+ * it changing between now and the time we do the CAS.
+ */
+ next = READ_ONCE(head->next);
+ if (try_cmpxchg_acquire(&list->head, &head, next)) {
+ /*
+ * Yay, got the node. This means it was on the list,
+ * which means should-be-on-freelist must be false no
+ * matter the refcount (because nobody else knows it's
+ * been taken off yet, it can't have been put back on).
+ */
+ WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
+
+ /*
+ * Decrease refcount twice, once for our ref, and once
+ * for the list's ref.
+ */
+ atomic_fetch_add(-2, &head->refs);
+
+ return head;
+ }
+
+ /*
+ * OK, the head must have changed on us, but we still need to decrement
+ * the refcount we increased.
+ */
+ refs = atomic_fetch_add(-1, &prev->refs);
+ if (refs == REFS_ON_FREELIST + 1)
+ __freelist_add(prev, list);
+ }
+
+ return NULL;
+}
+
+#endif /* FREELIST_H */

2020-08-29 13:13:47

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()

From: Peter Zijlstra <[email protected]>

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
drivers/gpu/drm/i915/i915_request.c | 6 ------
include/linux/llist.h | 23 +++++++++++++++++++++++
2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55e6194..0e851b925c8c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -357,12 +357,6 @@ void i915_request_retire_upto(struct i915_request *rq)
} while (i915_request_retire(tmp) && tmp != rq);
}

-static void __llist_add(struct llist_node *node, struct llist_head *head)
-{
- node->next = head->first;
- head->first = node;
-}
-
static struct i915_request * const *
__engine_active(struct intel_engine_cs *engine)
{
diff --git a/include/linux/llist.h b/include/linux/llist.h
index 2e9c7215882b..24f207b0190b 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -197,6 +197,16 @@ static inline struct llist_node *llist_next(struct llist_node *node)
extern bool llist_add_batch(struct llist_node *new_first,
struct llist_node *new_last,
struct llist_head *head);
+
+static inline bool __llist_add_batch(struct llist_node *new_first,
+ struct llist_node *new_last,
+ struct llist_head *head)
+{
+ new_last->next = head->first;
+ head->first = new_first;
+ return new_last->next == NULL;
+}
+
/**
* llist_add - add a new entry
* @new: new entry to be added
@@ -209,6 +219,11 @@ static inline bool llist_add(struct llist_node *new, struct llist_head *head)
return llist_add_batch(new, new, head);
}

+static inline bool __llist_add(struct llist_node *new, struct llist_head *head)
+{
+ return __llist_add_batch(new, new, head);
+}
+
/**
* llist_del_all - delete all entries from lock-less list
* @head: the head of lock-less list to delete all entries
@@ -222,6 +237,14 @@ static inline struct llist_node *llist_del_all(struct llist_head *head)
return xchg(&head->first, NULL);
}

+static inline struct llist_node *__llist_del_all(struct llist_head *head)
+{
+ struct llist_node *first = head->first;
+
+ head->first = NULL;
+ return first;
+}
+
extern struct llist_node *llist_del_first(struct llist_head *head);

struct llist_node *llist_reverse_order(struct llist_node *head);

2020-08-29 13:13:47

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks

From: Peter Zijlstra <[email protected]>

Only x86 provides try_cmpxchg() outside of the atomic_t interfaces,
provide generic fallbacks to create this interface from the widely
available cmpxchg() function.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Will Deacon <[email protected]>
---
arch/x86/include/asm/atomic.h | 2
arch/x86/include/asm/atomic64_64.h | 2
arch/x86/include/asm/cmpxchg.h | 2
include/asm-generic/atomic-instrumented.h | 216 +++++++++++++++++------------
include/linux/atomic-arch-fallback.h | 90 +++++++++++-
include/linux/atomic-fallback.h | 90 +++++++++++-
scripts/atomic/gen-atomic-fallback.sh | 63 ++++++++
scripts/atomic/gen-atomic-instrumented.sh | 29 +++-
8 files changed, 372 insertions(+), 122 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b6cac6e9bb70..f732741ad7c7 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -199,7 +199,7 @@ static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)

static __always_inline bool arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
{
- return try_cmpxchg(&v->counter, old, new);
+ return arch_try_cmpxchg(&v->counter, old, new);
}
#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg

diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 809bd010a751..7886d0578fc9 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -187,7 +187,7 @@ static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)

static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
{
- return try_cmpxchg(&v->counter, old, new);
+ return arch_try_cmpxchg(&v->counter, old, new);
}
#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index a8bfac131256..4d4ec5cbdc51 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,7 +221,7 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

-#define try_cmpxchg(ptr, pold, new) \
+#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

/*
diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index 379986e40159..2cab3fdaae3b 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -1642,148 +1642,192 @@ atomic64_dec_if_positive(atomic64_t *v)
#endif

#if !defined(arch_xchg_relaxed) || defined(arch_xchg)
-#define xchg(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg(__ai_ptr, __VA_ARGS__); \
+#define xchg(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_xchg_acquire)
-#define xchg_acquire(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
+#define xchg_acquire(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_xchg_release)
-#define xchg_release(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg_release(__ai_ptr, __VA_ARGS__); \
+#define xchg_release(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg_release(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_xchg_relaxed)
-#define xchg_relaxed(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
+#define xchg_relaxed(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
})
#endif

#if !defined(arch_cmpxchg_relaxed) || defined(arch_cmpxchg)
-#define cmpxchg(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg_acquire)
-#define cmpxchg_acquire(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_acquire(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg_release)
-#define cmpxchg_release(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_release(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg_relaxed)
-#define cmpxchg_relaxed(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_relaxed(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
})
#endif

#if !defined(arch_cmpxchg64_relaxed) || defined(arch_cmpxchg64)
-#define cmpxchg64(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg64_acquire)
-#define cmpxchg64_acquire(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_acquire(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg64_release)
-#define cmpxchg64_release(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_release(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg64_relaxed)
-#define cmpxchg64_relaxed(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_relaxed(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
})
#endif

-#define cmpxchg_local(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
+#if !defined(arch_try_cmpxchg_relaxed) || defined(arch_try_cmpxchg)
+#define try_cmpxchg(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_acquire)
+#define try_cmpxchg_acquire(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_release)
+#define try_cmpxchg_release(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_relaxed)
+#define try_cmpxchg_relaxed(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#define cmpxchg_local(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
})

-#define cmpxchg64_local(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_local(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
})

-#define sync_cmpxchg(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
+#define sync_cmpxchg(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

-#define cmpxchg_double(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
- arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_double(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+ arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
})


-#define cmpxchg_double_local(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
- arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_double_local(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+ arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
})

#endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// 89bf97f3a7509b740845e51ddf31055b48a81f40
+// ff0fe7f81ee97f01f13bb78b0e3ce800bc56d9dd
diff --git a/include/linux/atomic-arch-fallback.h b/include/linux/atomic-arch-fallback.h
index bcb6aa27cfa6..a3dba31df01e 100644
--- a/include/linux/atomic-arch-fallback.h
+++ b/include/linux/atomic-arch-fallback.h
@@ -9,9 +9,9 @@
#include <linux/compiler.h>

#ifndef arch_xchg_relaxed
-#define arch_xchg_relaxed arch_xchg
-#define arch_xchg_acquire arch_xchg
-#define arch_xchg_release arch_xchg
+#define arch_xchg_acquire arch_xchg
+#define arch_xchg_release arch_xchg
+#define arch_xchg_relaxed arch_xchg
#else /* arch_xchg_relaxed */

#ifndef arch_xchg_acquire
@@ -32,9 +32,9 @@
#endif /* arch_xchg_relaxed */

#ifndef arch_cmpxchg_relaxed
-#define arch_cmpxchg_relaxed arch_cmpxchg
-#define arch_cmpxchg_acquire arch_cmpxchg
-#define arch_cmpxchg_release arch_cmpxchg
+#define arch_cmpxchg_acquire arch_cmpxchg
+#define arch_cmpxchg_release arch_cmpxchg
+#define arch_cmpxchg_relaxed arch_cmpxchg
#else /* arch_cmpxchg_relaxed */

#ifndef arch_cmpxchg_acquire
@@ -55,9 +55,9 @@
#endif /* arch_cmpxchg_relaxed */

#ifndef arch_cmpxchg64_relaxed
-#define arch_cmpxchg64_relaxed arch_cmpxchg64
-#define arch_cmpxchg64_acquire arch_cmpxchg64
-#define arch_cmpxchg64_release arch_cmpxchg64
+#define arch_cmpxchg64_acquire arch_cmpxchg64
+#define arch_cmpxchg64_release arch_cmpxchg64
+#define arch_cmpxchg64_relaxed arch_cmpxchg64
#else /* arch_cmpxchg64_relaxed */

#ifndef arch_cmpxchg64_acquire
@@ -77,6 +77,76 @@

#endif /* arch_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_relaxed
+#ifdef arch_try_cmpxchg
+#define arch_try_cmpxchg_acquire arch_try_cmpxchg
+#define arch_try_cmpxchg_release arch_try_cmpxchg
+#define arch_try_cmpxchg_relaxed arch_try_cmpxchg
+#endif /* arch_try_cmpxchg */
+
+#ifndef arch_try_cmpxchg
+#define arch_try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg */
+
+#ifndef arch_try_cmpxchg_acquire
+#define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_acquire */
+
+#ifndef arch_try_cmpxchg_release
+#define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_release */
+
+#ifndef arch_try_cmpxchg_relaxed
+#define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_relaxed */
+
+#else /* arch_try_cmpxchg_relaxed */
+
+#ifndef arch_try_cmpxchg_acquire
+#define arch_try_cmpxchg_acquire(...) \
+ __atomic_op_acquire(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_try_cmpxchg_release
+#define arch_try_cmpxchg_release(...) \
+ __atomic_op_release(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_try_cmpxchg
+#define arch_try_cmpxchg(...) \
+ __atomic_op_fence(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* arch_try_cmpxchg_relaxed */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2288,4 +2358,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 90cd26cfd69d2250303d654955a0cc12620fb91b
+// cca554917d7ea73d5e3e7397dd70c484cad9b2c4
diff --git a/include/linux/atomic-fallback.h b/include/linux/atomic-fallback.h
index fd525c71d676..2a3f55d98be9 100644
--- a/include/linux/atomic-fallback.h
+++ b/include/linux/atomic-fallback.h
@@ -9,9 +9,9 @@
#include <linux/compiler.h>

#ifndef xchg_relaxed
-#define xchg_relaxed xchg
-#define xchg_acquire xchg
-#define xchg_release xchg
+#define xchg_acquire xchg
+#define xchg_release xchg
+#define xchg_relaxed xchg
#else /* xchg_relaxed */

#ifndef xchg_acquire
@@ -32,9 +32,9 @@
#endif /* xchg_relaxed */

#ifndef cmpxchg_relaxed
-#define cmpxchg_relaxed cmpxchg
-#define cmpxchg_acquire cmpxchg
-#define cmpxchg_release cmpxchg
+#define cmpxchg_acquire cmpxchg
+#define cmpxchg_release cmpxchg
+#define cmpxchg_relaxed cmpxchg
#else /* cmpxchg_relaxed */

#ifndef cmpxchg_acquire
@@ -55,9 +55,9 @@
#endif /* cmpxchg_relaxed */

#ifndef cmpxchg64_relaxed
-#define cmpxchg64_relaxed cmpxchg64
-#define cmpxchg64_acquire cmpxchg64
-#define cmpxchg64_release cmpxchg64
+#define cmpxchg64_acquire cmpxchg64
+#define cmpxchg64_release cmpxchg64
+#define cmpxchg64_relaxed cmpxchg64
#else /* cmpxchg64_relaxed */

#ifndef cmpxchg64_acquire
@@ -77,6 +77,76 @@

#endif /* cmpxchg64_relaxed */

+#ifndef try_cmpxchg_relaxed
+#ifdef try_cmpxchg
+#define try_cmpxchg_acquire try_cmpxchg
+#define try_cmpxchg_release try_cmpxchg
+#define try_cmpxchg_relaxed try_cmpxchg
+#endif /* try_cmpxchg */
+
+#ifndef try_cmpxchg
+#define try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg */
+
+#ifndef try_cmpxchg_acquire
+#define try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg_acquire((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_acquire */
+
+#ifndef try_cmpxchg_release
+#define try_cmpxchg_release(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg_release((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_release */
+
+#ifndef try_cmpxchg_relaxed
+#define try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg_relaxed((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_relaxed */
+
+#else /* try_cmpxchg_relaxed */
+
+#ifndef try_cmpxchg_acquire
+#define try_cmpxchg_acquire(...) \
+ __atomic_op_acquire(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef try_cmpxchg_release
+#define try_cmpxchg_release(...) \
+ __atomic_op_release(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef try_cmpxchg
+#define try_cmpxchg(...) \
+ __atomic_op_fence(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* try_cmpxchg_relaxed */
+
#define arch_atomic_read atomic_read
#define arch_atomic_read_acquire atomic_read_acquire

@@ -2522,4 +2592,4 @@ atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 9d95b56f98d82a2a26c7b79ccdd0c47572d50a6f
+// d78e6c293c661c15188f0ec05bce45188c8d5892
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 693dfa1de430..317a6cec76e1 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -144,15 +144,11 @@ gen_proto_order_variants()
printf "#endif /* ${basename}_relaxed */\n\n"
}

-gen_xchg_fallbacks()
+gen_order_fallbacks()
{
local xchg="$1"; shift
+
cat <<EOF
-#ifndef ${xchg}_relaxed
-#define ${xchg}_relaxed ${xchg}
-#define ${xchg}_acquire ${xchg}
-#define ${xchg}_release ${xchg}
-#else /* ${xchg}_relaxed */

#ifndef ${xchg}_acquire
#define ${xchg}_acquire(...) \\
@@ -169,11 +165,62 @@ cat <<EOF
__atomic_op_fence(${xchg}, __VA_ARGS__)
#endif

-#endif /* ${xchg}_relaxed */
+EOF
+}
+
+gen_xchg_fallbacks()
+{
+ local xchg="$1"; shift
+ printf "#ifndef ${xchg}_relaxed\n"
+
+ gen_basic_fallbacks ${xchg}
+
+ printf "#else /* ${xchg}_relaxed */\n"
+
+ gen_order_fallbacks ${xchg}
+
+ printf "#endif /* ${xchg}_relaxed */\n\n"
+}
+
+gen_try_cmpxchg_fallback()
+{
+ local order="$1"; shift;
+
+cat <<EOF
+#ifndef ${ARCH}try_cmpxchg${order}
+#define ${ARCH}try_cmpxchg${order}(_ptr, _oldp, _new) \\
+({ \\
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
+ ___r = ${ARCH}cmpxchg${order}((_ptr), ___o, (_new)); \\
+ if (unlikely(___r != ___o)) \\
+ *___op = ___r; \\
+ likely(___r == ___o); \\
+})
+#endif /* ${ARCH}try_cmpxchg${order} */

EOF
}

+gen_try_cmpxchg_fallbacks()
+{
+ printf "#ifndef ${ARCH}try_cmpxchg_relaxed\n"
+ printf "#ifdef ${ARCH}try_cmpxchg\n"
+
+ gen_basic_fallbacks "${ARCH}try_cmpxchg"
+
+ printf "#endif /* ${ARCH}try_cmpxchg */\n\n"
+
+ for order in "" "_acquire" "_release" "_relaxed"; do
+ gen_try_cmpxchg_fallback "${order}"
+ done
+
+ printf "#else /* ${ARCH}try_cmpxchg_relaxed */\n"
+
+ gen_order_fallbacks "${ARCH}try_cmpxchg"
+
+ printf "#endif /* ${ARCH}try_cmpxchg_relaxed */\n\n"
+}
+
cat << EOF
// SPDX-License-Identifier: GPL-2.0

@@ -191,6 +238,8 @@ for xchg in "${ARCH}xchg" "${ARCH}cmpxchg" "${ARCH}cmpxchg64"; do
gen_xchg_fallbacks "${xchg}"
done

+gen_try_cmpxchg_fallbacks
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "${ARCH}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 6afadf73da17..85dc25685c0d 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -103,14 +103,31 @@ gen_xchg()
local xchg="$1"; shift
local mult="$1"; shift

+ if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then
+
+cat <<EOF
+#define ${xchg}(ptr, oldp, ...) \\
+({ \\
+ typeof(ptr) __ai_ptr = (ptr); \\
+ typeof(oldp) __ai_oldp = (oldp); \\
+ instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
+ instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
+ arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
+})
+EOF
+
+ else
+
cat <<EOF
-#define ${xchg}(ptr, ...) \\
-({ \\
- typeof(ptr) __ai_ptr = (ptr); \\
- instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
- arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
+#define ${xchg}(ptr, ...) \\
+({ \\
+ typeof(ptr) __ai_ptr = (ptr); \\
+ instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
+ arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
})
EOF
+
+ fi
}

gen_optional_xchg()
@@ -160,7 +177,7 @@ grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic64" "s64" ${args}
done

-for xchg in "xchg" "cmpxchg" "cmpxchg64"; do
+for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg"; do
for order in "" "_acquire" "_release" "_relaxed"; do
gen_optional_xchg "${xchg}" "${order}"
done

2020-08-29 13:16:55

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 16/21] kprobes: Make local used functions static

Since we unified the kretprobe trampoline handler from arch/* code,
some functions and objects no need to be exported anymore.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v4:
- Convert kretprobe_assert() into a BUG_ON().
---
include/linux/kprobes.h | 15 ---------------
kernel/kprobes.c | 9 ++++-----
2 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 663be8debf25..9c880c8a4e80 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -190,7 +190,6 @@ static inline int kprobes_built_in(void)
return 1;
}

-extern struct kprobe kprobe_busy;
void kprobe_busy_begin(void);
void kprobe_busy_end(void);

@@ -235,16 +234,6 @@ static inline int arch_trampoline_kprobe(struct kprobe *p)

extern struct kretprobe_blackpoint kretprobe_blacklist[];

-static inline void kretprobe_assert(struct kretprobe_instance *ri,
- unsigned long orig_ret_address, unsigned long trampoline_address)
-{
- if (!orig_ret_address || (orig_ret_address == trampoline_address)) {
- printk("kretprobe BUG!: Processing kretprobe %p @ %p\n",
- ri->rp, ri->rp->kp.addr);
- BUG();
- }
-}
-
#ifdef CONFIG_KPROBES_SANITY_TEST
extern int init_test_probes(void);
#else
@@ -364,10 +353,6 @@ int arch_check_ftrace_location(struct kprobe *p);

/* Get the kprobe at this addr (if any) - called with preemption disabled */
struct kprobe *get_kprobe(void *addr);
-void kretprobe_hash_lock(struct task_struct *tsk,
- struct hlist_head **head, unsigned long *flags);
-void kretprobe_hash_unlock(struct task_struct *tsk, unsigned long *flags);
-struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk);

/* kprobe_running() will just return the current_kprobe on this CPU */
static inline struct kprobe *kprobe_running(void)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0676868f1ac2..732a70163584 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1239,7 +1239,7 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
}
NOKPROBE_SYMBOL(recycle_rp_inst);

-void kretprobe_hash_lock(struct task_struct *tsk,
+static void kretprobe_hash_lock(struct task_struct *tsk,
struct hlist_head **head, unsigned long *flags)
__acquires(hlist_lock)
{
@@ -1261,7 +1261,7 @@ __acquires(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

-void kretprobe_hash_unlock(struct task_struct *tsk,
+static void kretprobe_hash_unlock(struct task_struct *tsk,
unsigned long *flags)
__releases(hlist_lock)
{
@@ -1282,7 +1282,7 @@ __releases(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_unlock);

-struct kprobe kprobe_busy = {
+static struct kprobe kprobe_busy = {
.addr = (void *) get_kprobe,
};

@@ -1983,8 +1983,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
break;
}

- kretprobe_assert(ri, (unsigned long)correct_ret_addr,
- (unsigned long)trampoline_address);
+ BUG_ON(!correct_ret_addr || (correct_ret_addr == trampoline_address));
last = ri;

hlist_for_each_entry_safe(ri, tmp, head, hlist) {

2020-08-29 13:16:55

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 21/21] kprobes: Replace rp->free_instance with freelist

From: Peter Zijlstra <[email protected]>

Gets rid of rp->lock, and as a result kretprobes are now fully
lockless.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
Changes
- [MH] expel the llist from anon union in kretprobe_instance
---
include/linux/kprobes.h | 8 +++----
kernel/kprobes.c | 56 ++++++++++++++++++++---------------------------
2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index f8f87a13345a..89ff0fc15286 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -28,6 +28,7 @@
#include <linux/mutex.h>
#include <linux/ftrace.h>
#include <linux/refcount.h>
+#include <linux/freelist.h>
#include <asm/kprobes.h>

#ifdef CONFIG_KPROBES
@@ -157,17 +158,16 @@ struct kretprobe {
int maxactive;
int nmissed;
size_t data_size;
- struct hlist_head free_instances;
+ struct freelist_head freelist;
struct kretprobe_holder *rph;
- raw_spinlock_t lock;
};

struct kretprobe_instance {
union {
- struct llist_node llist;
- struct hlist_node hlist;
+ struct freelist_node freelist;
struct rcu_head rcu;
};
+ struct llist_node llist;
struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
void *fp;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index bc65603fce00..af6551992b91 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1228,11 +1228,8 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
{
struct kretprobe *rp = get_kretprobe(ri);

- INIT_HLIST_NODE(&ri->hlist);
if (likely(rp)) {
- raw_spin_lock(&rp->lock);
- hlist_add_head(&ri->hlist, &rp->free_instances);
- raw_spin_unlock(&rp->lock);
+ freelist_add(&ri->freelist, &rp->freelist);
} else
call_rcu(&ri->rcu, free_rp_inst_rcu);
}
@@ -1290,11 +1287,14 @@ NOKPROBE_SYMBOL(kprobe_flush_task);
static inline void free_rp_inst(struct kretprobe *rp)
{
struct kretprobe_instance *ri;
- struct hlist_node *next;
+ struct freelist_node *node;
int count = 0;

- hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
- hlist_del(&ri->hlist);
+ node = rp->freelist.head;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, freelist);
+ node = node->next;
+
kfree(ri);
count++;
}
@@ -1925,32 +1925,26 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
{
struct kretprobe *rp = container_of(p, struct kretprobe, kp);
- unsigned long flags = 0;
struct kretprobe_instance *ri;
+ struct freelist_node *fn;

- /* TODO: consider to only swap the RA after the last pre_handler fired */
- raw_spin_lock_irqsave(&rp->lock, flags);
- if (!hlist_empty(&rp->free_instances)) {
- ri = hlist_entry(rp->free_instances.first,
- struct kretprobe_instance, hlist);
- hlist_del(&ri->hlist);
- raw_spin_unlock_irqrestore(&rp->lock, flags);
-
- if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- raw_spin_lock_irqsave(&rp->lock, flags);
- hlist_add_head(&ri->hlist, &rp->free_instances);
- raw_spin_unlock_irqrestore(&rp->lock, flags);
- return 0;
- }
-
- arch_prepare_kretprobe(ri, regs);
+ fn = freelist_try_get(&rp->freelist);
+ if (!fn) {
+ rp->nmissed++;
+ return 0;
+ }

- __llist_add(&ri->llist, &current->kretprobe_instances);
+ ri = container_of(fn, struct kretprobe_instance, freelist);

- } else {
- rp->nmissed++;
- raw_spin_unlock_irqrestore(&rp->lock, flags);
+ if (rp->entry_handler && rp->entry_handler(ri, regs)) {
+ freelist_add(&ri->freelist, &rp->freelist);
+ return 0;
}
+
+ arch_prepare_kretprobe(ri, regs);
+
+ __llist_add(&ri->llist, &current->kretprobe_instances);
+
return 0;
}
NOKPROBE_SYMBOL(pre_handler_kretprobe);
@@ -2007,8 +2001,7 @@ int register_kretprobe(struct kretprobe *rp)
rp->maxactive = num_possible_cpus();
#endif
}
- raw_spin_lock_init(&rp->lock);
- INIT_HLIST_HEAD(&rp->free_instances);
+ rp->freelist.head = NULL;
rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
if (!rp->rph)
return -ENOMEM;
@@ -2023,8 +2016,7 @@ int register_kretprobe(struct kretprobe *rp)
return -ENOMEM;
}
inst->rph = rp->rph;
- INIT_HLIST_NODE(&inst->hlist);
- hlist_add_head(&inst->hlist, &rp->free_instances);
+ freelist_add(&inst->freelist, &rp->freelist);
}
refcount_set(&rp->rph->ref, i);


2020-08-29 13:18:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 18/21] kprobes: Remove kretprobe hash

From: Peter Zijlstra <[email protected]>

The kretprobe hash is mostly superfluous, replace it with a per-task
variable.

This gets rid of the task hash and it's related locking.


Note that this may change the kprobes module-exported API for kretprobe
handlers. If any out-of-tree kretprobe user uses ri->rp, use
get_kretprobe(ri) instead.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Fix not to flush current task's kretprobe_instance
- Remove task scan by merging kretprobe_holder patch
- Use READ_ONCE() and rcu_locked check instead of smp_r/wmb()
- Make the failure case in __kretprobe_trampoline_handler() clearer
---
include/linux/kprobes.h | 19 +++
include/linux/sched.h | 4 +
kernel/fork.c | 4 +
kernel/kprobes.c | 236 +++++++++++++------------------------------
kernel/trace/trace_kprobe.c | 3 -
5 files changed, 97 insertions(+), 169 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9c880c8a4e80..f8f87a13345a 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -27,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/mutex.h>
#include <linux/ftrace.h>
+#include <linux/refcount.h>
#include <asm/kprobes.h>

#ifdef CONFIG_KPROBES
@@ -144,6 +145,11 @@ static inline int kprobe_ftrace(struct kprobe *p)
* ignored, due to maxactive being too low.
*
*/
+struct kretprobe_holder {
+ struct kretprobe *rp;
+ refcount_t ref;
+};
+
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
@@ -152,17 +158,18 @@ struct kretprobe {
int nmissed;
size_t data_size;
struct hlist_head free_instances;
+ struct kretprobe_holder *rph;
raw_spinlock_t lock;
};

struct kretprobe_instance {
union {
+ struct llist_node llist;
struct hlist_node hlist;
struct rcu_head rcu;
};
- struct kretprobe *rp;
+ struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
- struct task_struct *task;
void *fp;
char data[];
};
@@ -221,6 +228,14 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
return ret;
}

+static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri)
+{
+ RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
+ "Kretprobe is accessed from instance under preemptive context");
+
+ return READ_ONCE(ri->rph->rp);
+}
+
#else /* CONFIG_KRETPROBES */
static inline void arch_prepare_kretprobe(struct kretprobe *rp,
struct pt_regs *regs)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 93ecd930efd3..0f2532f052a9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1315,6 +1315,10 @@ struct task_struct {
struct callback_head mce_kill_me;
#endif

+#ifdef CONFIG_KRETPROBES
+ struct llist_head kretprobe_instances;
+#endif
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/kernel/fork.c b/kernel/fork.c
index 4d32190861bd..2ff5cceb0732 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2161,6 +2161,10 @@ static __latent_entropy struct task_struct *copy_process(
INIT_LIST_HEAD(&p->thread_group);
p->task_works = NULL;

+#ifdef CONFIG_KRETPROBES
+ p->kretprobe_instances.first = NULL;
+#endif
+
/*
* Ensure that the cgroup subsystem policies allow the new process to be
* forked. It should be noted the the new process's css_set can be changed
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 732a70163584..bc65603fce00 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -53,7 +53,6 @@ static int kprobes_initialized;
* - RCU hlist traversal under disabling preempt (breakpoint handlers)
*/
static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];

/* NOTE: change this value only with kprobe_mutex held */
static bool kprobes_all_disarmed;
@@ -61,9 +60,6 @@ static bool kprobes_all_disarmed;
/* This protects kprobe_table and optimizing_list */
static DEFINE_MUTEX(kprobe_mutex);
static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
-static struct {
- raw_spinlock_t lock ____cacheline_aligned_in_smp;
-} kretprobe_table_locks[KPROBE_TABLE_SIZE];

kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
unsigned int __unused)
@@ -71,11 +67,6 @@ kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
}

-static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
-{
- return &(kretprobe_table_locks[hash].lock);
-}
-
/* Blacklist -- list of struct kprobe_blacklist_entry */
static LIST_HEAD(kprobe_blacklist);

@@ -1223,65 +1214,30 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
}
NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);

+static void free_rp_inst_rcu(struct rcu_head *head)
+{
+ struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
+
+ if (refcount_dec_and_test(&ri->rph->ref))
+ kfree(ri->rph);
+ kfree(ri);
+}
+NOKPROBE_SYMBOL(free_rp_inst_rcu);
+
static void recycle_rp_inst(struct kretprobe_instance *ri)
{
- struct kretprobe *rp = ri->rp;
+ struct kretprobe *rp = get_kretprobe(ri);

- /* remove rp inst off the rprobe_inst_table */
- hlist_del(&ri->hlist);
INIT_HLIST_NODE(&ri->hlist);
if (likely(rp)) {
raw_spin_lock(&rp->lock);
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock(&rp->lock);
} else
- kfree_rcu(ri, rcu);
+ call_rcu(&ri->rcu, free_rp_inst_rcu);
}
NOKPROBE_SYMBOL(recycle_rp_inst);

-static void kretprobe_hash_lock(struct task_struct *tsk,
- struct hlist_head **head, unsigned long *flags)
-__acquires(hlist_lock)
-{
- unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
- raw_spinlock_t *hlist_lock;
-
- *head = &kretprobe_inst_table[hash];
- hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_lock);
-
-static void kretprobe_table_lock(unsigned long hash,
- unsigned long *flags)
-__acquires(hlist_lock)
-{
- raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_lock);
-
-static void kretprobe_hash_unlock(struct task_struct *tsk,
- unsigned long *flags)
-__releases(hlist_lock)
-{
- unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
- raw_spinlock_t *hlist_lock;
-
- hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_unlock);
-
-static void kretprobe_table_unlock(unsigned long hash,
- unsigned long *flags)
-__releases(hlist_lock)
-{
- raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_unlock);
-
static struct kprobe kprobe_busy = {
.addr = (void *) get_kprobe,
};
@@ -1311,24 +1267,21 @@ void kprobe_busy_end(void)
void kprobe_flush_task(struct task_struct *tk)
{
struct kretprobe_instance *ri;
- struct hlist_head *head;
- struct hlist_node *tmp;
- unsigned long hash, flags = 0;
+ struct llist_node *node;

+ /* Early boot, not yet initialized. */
if (unlikely(!kprobes_initialized))
- /* Early boot. kretprobe_table_locks not yet initialized. */
return;

kprobe_busy_begin();

- hash = hash_ptr(tk, KPROBE_HASH_BITS);
- head = &kretprobe_inst_table[hash];
- kretprobe_table_lock(hash, &flags);
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task == tk)
- recycle_rp_inst(ri);
+ node = __llist_del_all(&tk->kretprobe_instances);
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);
+ node = node->next;
+
+ recycle_rp_inst(ri);
}
- kretprobe_table_unlock(hash, &flags);

kprobe_busy_end();
}
@@ -1338,36 +1291,19 @@ static inline void free_rp_inst(struct kretprobe *rp)
{
struct kretprobe_instance *ri;
struct hlist_node *next;
+ int count = 0;

hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
+ count++;
}
-}
-
-static void cleanup_rp_inst(struct kretprobe *rp)
-{
- unsigned long flags, hash;
- struct kretprobe_instance *ri;
- struct hlist_node *next;
- struct hlist_head *head;

- /* To avoid recursive kretprobe by NMI, set kprobe busy here */
- kprobe_busy_begin();
- for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
- kretprobe_table_lock(hash, &flags);
- head = &kretprobe_inst_table[hash];
- hlist_for_each_entry_safe(ri, next, head, hlist) {
- if (ri->rp == rp)
- ri->rp = NULL;
- }
- kretprobe_table_unlock(hash, &flags);
+ if (refcount_sub_and_test(count, &rp->rph->ref)) {
+ kfree(rp->rph);
+ rp->rph = NULL;
}
- kprobe_busy_end();
-
- free_rp_inst(rp);
}
-NOKPROBE_SYMBOL(cleanup_rp_inst);

/* Add the new probe to ap->list */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
@@ -1928,88 +1864,56 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *trampoline_address,
void *frame_pointer)
{
- struct kretprobe_instance *ri = NULL, *last = NULL;
- struct hlist_head *head;
- struct hlist_node *tmp;
- unsigned long flags;
kprobe_opcode_t *correct_ret_addr = NULL;
- bool skipped = false;
+ struct kretprobe_instance *ri = NULL;
+ struct llist_node *first, *node;
+ struct kretprobe *rp;

- kretprobe_hash_lock(current, &head, &flags);
+ /* Find all nodes for this frame. */
+ first = node = current->kretprobe_instances.first;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);

- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry(ri, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- /*
- * Return probes must be pushed on this hash list correct
- * order (same as return order) so that it can be popped
- * correctly. However, if we find it is pushed it incorrect
- * order, this means we find a function which should not be
- * probed, because the wrong order entry is pushed on the
- * path of processing other kretprobe itself.
- */
- if (ri->fp != frame_pointer) {
- if (!skipped)
- pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
- skipped = true;
- continue;
- }
+ BUG_ON(ri->fp != frame_pointer);

- correct_ret_addr = ri->ret_addr;
- if (skipped)
- pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
- ri->rp->kp.addr);
-
- if (correct_ret_addr != trampoline_address)
+ if (ri->ret_addr != trampoline_address) {
+ correct_ret_addr = ri->ret_addr;
/*
* This is the real return address. Any other
* instances associated with this task are for
* other calls deeper on the call stack
*/
- break;
+ goto found;
+ }
+
+ node = node->next;
}
+ pr_err("Oops! Kretprobe fails to find correct return address.\n");
+ BUG_ON(1);

- BUG_ON(!correct_ret_addr || (correct_ret_addr == trampoline_address));
- last = ri;
+found:
+ /* Unlink all nodes for this frame. */
+ current->kretprobe_instances.first = node->next;
+ node->next = NULL;

- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- if (ri->fp != frame_pointer)
- continue;
+ /* Run them.. */
+ while (first) {
+ ri = container_of(first, struct kretprobe_instance, llist);
+ first = first->next;

- if (ri->rp && ri->rp->handler) {
+ rp = get_kretprobe(ri);
+ if (rp && rp->handler) {
struct kprobe *prev = kprobe_running();

- __this_cpu_write(current_kprobe, &ri->rp->kp);
+ __this_cpu_write(current_kprobe, &rp->kp);
ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
+ rp->handler(ri, regs);
__this_cpu_write(current_kprobe, prev);
}

recycle_rp_inst(ri);
-
- if (ri == last)
- break;
}

- kretprobe_hash_unlock(current, &flags);
-
return (unsigned long)correct_ret_addr;
}
NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
@@ -2021,11 +1925,10 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
{
struct kretprobe *rp = container_of(p, struct kretprobe, kp);
- unsigned long hash, flags = 0;
+ unsigned long flags = 0;
struct kretprobe_instance *ri;

/* TODO: consider to only swap the RA after the last pre_handler fired */
- hash = hash_ptr(current, KPROBE_HASH_BITS);
raw_spin_lock_irqsave(&rp->lock, flags);
if (!hlist_empty(&rp->free_instances)) {
ri = hlist_entry(rp->free_instances.first,
@@ -2033,9 +1936,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
hlist_del(&ri->hlist);
raw_spin_unlock_irqrestore(&rp->lock, flags);

- ri->rp = rp;
- ri->task = current;
-
if (rp->entry_handler && rp->entry_handler(ri, regs)) {
raw_spin_lock_irqsave(&rp->lock, flags);
hlist_add_head(&ri->hlist, &rp->free_instances);
@@ -2045,11 +1945,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)

arch_prepare_kretprobe(ri, regs);

- /* XXX(hch): why is there no hlist_move_head? */
- INIT_HLIST_NODE(&ri->hlist);
- kretprobe_table_lock(hash, &flags);
- hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
- kretprobe_table_unlock(hash, &flags);
+ __llist_add(&ri->llist, &current->kretprobe_instances);
+
} else {
rp->nmissed++;
raw_spin_unlock_irqrestore(&rp->lock, flags);
@@ -2112,16 +2009,24 @@ int register_kretprobe(struct kretprobe *rp)
}
raw_spin_lock_init(&rp->lock);
INIT_HLIST_HEAD(&rp->free_instances);
+ rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
+ if (!rp->rph)
+ return -ENOMEM;
+
+ rp->rph->rp = rp;
for (i = 0; i < rp->maxactive; i++) {
- inst = kmalloc(sizeof(struct kretprobe_instance) +
+ inst = kzalloc(sizeof(struct kretprobe_instance) +
rp->data_size, GFP_KERNEL);
if (inst == NULL) {
+ refcount_set(&rp->rph->ref, i);
free_rp_inst(rp);
return -ENOMEM;
}
+ inst->rph = rp->rph;
INIT_HLIST_NODE(&inst->hlist);
hlist_add_head(&inst->hlist, &rp->free_instances);
}
+ refcount_set(&rp->rph->ref, i);

rp->nmissed = 0;
/* Establish function entry probe point */
@@ -2163,16 +2068,18 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
if (num <= 0)
return;
mutex_lock(&kprobe_mutex);
- for (i = 0; i < num; i++)
+ for (i = 0; i < num; i++) {
if (__unregister_kprobe_top(&rps[i]->kp) < 0)
rps[i]->kp.addr = NULL;
+ rps[i]->rph->rp = NULL;
+ }
mutex_unlock(&kprobe_mutex);

synchronize_rcu();
for (i = 0; i < num; i++) {
if (rps[i]->kp.addr) {
__unregister_kprobe_bottom(&rps[i]->kp);
- cleanup_rp_inst(rps[i]);
+ free_rp_inst(rps[i]);
}
}
}
@@ -2534,11 +2441,8 @@ static int __init init_kprobes(void)

/* FIXME allocate the probe table, currently defined statically */
/* initialize all list heads */
- for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+ for (i = 0; i < KPROBE_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&kprobe_table[i]);
- INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
- raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
- }

err = populate_kprobe_blacklist(__start_kprobe_blacklist,
__stop_kprobe_blacklist);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index aefb6065b508..07baf6f6cecc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1714,7 +1714,8 @@ NOKPROBE_SYMBOL(kprobe_dispatcher);
static int
kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
{
- struct trace_kprobe *tk = container_of(ri->rp, struct trace_kprobe, rp);
+ struct kretprobe *rp = get_kretprobe(ri);
+ struct trace_kprobe *tk = container_of(rp, struct trace_kprobe, rp);

raw_cpu_inc(*tk->nhit);


2020-08-29 13:21:48

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 10/21] powerpc: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v2:
Fix to use correct trampoline_address.
---
arch/powerpc/kernel/kprobes.c | 53 ++---------------------------------------
1 file changed, 3 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 6ab9b4d037c3..01ab2163659e 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -218,6 +218,7 @@ bool arch_kprobe_on_func_entry(unsigned long offset)
void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->link;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->link = (unsigned long)kretprobe_trampoline;
@@ -396,50 +397,9 @@ asm(".global kretprobe_trampoline\n"
*/
static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ unsigned long orig_ret_address;

+ orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
/*
* We get here through one of two paths:
* 1. by taking a trap -> kprobe_handler() -> here
@@ -458,13 +418,6 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
regs->nip = orig_ret_address - 4;
regs->link = orig_ret_address;

- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
return 0;
}
NOKPROBE_SYMBOL(trampoline_probe_handler);

2020-08-29 13:23:05

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 09/21] parisc: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/parisc/kernel/kprobes.c | 76 ++----------------------------------------
1 file changed, 4 insertions(+), 72 deletions(-)

diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 77ec51818916..6d21a515eea5 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -191,80 +191,11 @@ static struct kprobe trampoline_p = {
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)trampoline_p.addr;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * a return probe installed on them, and/or more than one return
- * probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
+ unsigned long orig_ret_address;

- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL);
instruction_pointer_set(regs, orig_ret_address);
+
return 1;
}

@@ -272,6 +203,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->gr[2];
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr. */
regs->gr[2] = (unsigned long)trampoline_p.addr;

2020-08-29 13:24:33

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 07/21] ia64: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/ia64/kernel/kprobes.c | 77 +-------------------------------------------
1 file changed, 2 insertions(+), 75 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 7a7df944d798..fc1ff8a4d7de 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -396,83 +396,9 @@ static void kretprobe_trampoline(void)
{
}

-/*
- * At this point the target function has been tricked into
- * returning into our trampoline. Lookup the associated instance
- * and then:
- * - call the handler function
- * - cleanup by marking the instance as unused
- * - long jump back to the original return address
- */
int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- ((struct fnptr *)kretprobe_trampoline)->ip;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- regs->cr_iip = orig_ret_address;
-
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler
@@ -485,6 +411,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->b0;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;

2020-08-29 13:25:30

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 08/21] mips: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/mips/kernel/kprobes.c | 54 ++------------------------------------------
1 file changed, 3 insertions(+), 51 deletions(-)

diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index d043c2f897fc..54dfba8fa77c 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -477,6 +477,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *) regs->regs[31];
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->regs[31] = (unsigned long)kretprobe_trampoline;
@@ -488,57 +489,8 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- instruction_pointer(regs) = orig_ret_address;
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ instruction_pointer(regs) = __kretprobe_trampoline_handler(regs,
+ kretprobe_trampoline, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler

2020-08-29 13:29:25

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 06/21] csky: kprobes: Use generic kretprobe trampoline handler

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Guo Ren <[email protected]>
---
arch/csky/kernel/probes/kprobes.c | 77 +------------------------------------
1 file changed, 2 insertions(+), 75 deletions(-)

diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index f0f733b7ac5a..589f090f48b9 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -404,87 +404,14 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->lr;
+ ri->fp = NULL;
regs->lr = (unsigned long) &kretprobe_trampoline;
}


2020-09-01 19:12:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> Masami Hiramatsu (16):
> kprobes: Add generic kretprobe trampoline handler
> x86/kprobes: Use generic kretprobe trampoline handler
> arm: kprobes: Use generic kretprobe trampoline handler
> arm64: kprobes: Use generic kretprobe trampoline handler
> arc: kprobes: Use generic kretprobe trampoline handler
> csky: kprobes: Use generic kretprobe trampoline handler
> ia64: kprobes: Use generic kretprobe trampoline handler
> mips: kprobes: Use generic kretprobe trampoline handler
> parisc: kprobes: Use generic kretprobe trampoline handler
> powerpc: kprobes: Use generic kretprobe trampoline handler
> s390: kprobes: Use generic kretprobe trampoline handler
> sh: kprobes: Use generic kretprobe trampoline handler
> sparc: kprobes: Use generic kretprobe trampoline handler
> kprobes: Remove NMI context check
> kprobes: Free kretprobe_instance with rcu callback
> kprobes: Make local used functions static
>
> Peter Zijlstra (5):
> llist: Add nonatomic __llist_add() and __llist_dell_all()
> kprobes: Remove kretprobe hash
> asm-generic/atomic: Add try_cmpxchg() fallbacks
> freelist: Lock less freelist
> kprobes: Replace rp->free_instance with freelist

This looks good to me, do you want me to merge them through -tip? If so,
do we want to try and get them in this release still?

Ingo, opinions? This basically fixes a regression cauesd by

0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2020-09-02 00:39:37

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Tue, 1 Sep 2020 21:08:08 +0200
Peter Zijlstra <[email protected]> wrote:

> On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > Masami Hiramatsu (16):
> > kprobes: Add generic kretprobe trampoline handler
> > x86/kprobes: Use generic kretprobe trampoline handler
> > arm: kprobes: Use generic kretprobe trampoline handler
> > arm64: kprobes: Use generic kretprobe trampoline handler
> > arc: kprobes: Use generic kretprobe trampoline handler
> > csky: kprobes: Use generic kretprobe trampoline handler
> > ia64: kprobes: Use generic kretprobe trampoline handler
> > mips: kprobes: Use generic kretprobe trampoline handler
> > parisc: kprobes: Use generic kretprobe trampoline handler
> > powerpc: kprobes: Use generic kretprobe trampoline handler
> > s390: kprobes: Use generic kretprobe trampoline handler
> > sh: kprobes: Use generic kretprobe trampoline handler
> > sparc: kprobes: Use generic kretprobe trampoline handler
> > kprobes: Remove NMI context check
> > kprobes: Free kretprobe_instance with rcu callback
> > kprobes: Make local used functions static
> >
> > Peter Zijlstra (5):
> > llist: Add nonatomic __llist_add() and __llist_dell_all()
> > kprobes: Remove kretprobe hash
> > asm-generic/atomic: Add try_cmpxchg() fallbacks
> > freelist: Lock less freelist
> > kprobes: Replace rp->free_instance with freelist
>
> This looks good to me, do you want me to merge them through -tip? If so,
> do we want to try and get them in this release still?

Yes, thanks. For the kretprobe missing issue, we will need the first half
(up to "kprobes: Remove NMI context check"), so we can split the series
if someone think the lockless is still immature.

>
> Ingo, opinions? This basically fixes a regression cauesd by
>
> 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
>

Oops, I missed Ingo in CC.

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-09-02 07:04:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, Sep 02, 2020 at 09:37:39AM +0900, Masami Hiramatsu wrote:
> On Tue, 1 Sep 2020 21:08:08 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > > Masami Hiramatsu (16):
> > > kprobes: Add generic kretprobe trampoline handler
> > > x86/kprobes: Use generic kretprobe trampoline handler
> > > arm: kprobes: Use generic kretprobe trampoline handler
> > > arm64: kprobes: Use generic kretprobe trampoline handler
> > > arc: kprobes: Use generic kretprobe trampoline handler
> > > csky: kprobes: Use generic kretprobe trampoline handler
> > > ia64: kprobes: Use generic kretprobe trampoline handler
> > > mips: kprobes: Use generic kretprobe trampoline handler
> > > parisc: kprobes: Use generic kretprobe trampoline handler
> > > powerpc: kprobes: Use generic kretprobe trampoline handler
> > > s390: kprobes: Use generic kretprobe trampoline handler
> > > sh: kprobes: Use generic kretprobe trampoline handler
> > > sparc: kprobes: Use generic kretprobe trampoline handler
> > > kprobes: Remove NMI context check
> > > kprobes: Free kretprobe_instance with rcu callback
> > > kprobes: Make local used functions static
> > >
> > > Peter Zijlstra (5):
> > > llist: Add nonatomic __llist_add() and __llist_dell_all()
> > > kprobes: Remove kretprobe hash
> > > asm-generic/atomic: Add try_cmpxchg() fallbacks
> > > freelist: Lock less freelist
> > > kprobes: Replace rp->free_instance with freelist
> >
> > This looks good to me, do you want me to merge them through -tip? If so,
> > do we want to try and get them in this release still?
>
> Yes, thanks. For the kretprobe missing issue, we will need the first half
> (up to "kprobes: Remove NMI context check"), so we can split the series
> if someone think the lockless is still immature.

Ok, but then lockdep will yell at you if you have that enabled and run
the unoptimized things.

> > Ingo, opinions? This basically fixes a regression cauesd by
> >
> > 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")
> >
>
> Oops, I missed Ingo in CC.

You had x86@, he'll have a copy :-)

2020-09-02 08:19:13

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, 2 Sep 2020 09:02:26 +0200
[email protected] wrote:

> On Wed, Sep 02, 2020 at 09:37:39AM +0900, Masami Hiramatsu wrote:
> > On Tue, 1 Sep 2020 21:08:08 +0200
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > > > Masami Hiramatsu (16):
> > > > kprobes: Add generic kretprobe trampoline handler
> > > > x86/kprobes: Use generic kretprobe trampoline handler
> > > > arm: kprobes: Use generic kretprobe trampoline handler
> > > > arm64: kprobes: Use generic kretprobe trampoline handler
> > > > arc: kprobes: Use generic kretprobe trampoline handler
> > > > csky: kprobes: Use generic kretprobe trampoline handler
> > > > ia64: kprobes: Use generic kretprobe trampoline handler
> > > > mips: kprobes: Use generic kretprobe trampoline handler
> > > > parisc: kprobes: Use generic kretprobe trampoline handler
> > > > powerpc: kprobes: Use generic kretprobe trampoline handler
> > > > s390: kprobes: Use generic kretprobe trampoline handler
> > > > sh: kprobes: Use generic kretprobe trampoline handler
> > > > sparc: kprobes: Use generic kretprobe trampoline handler
> > > > kprobes: Remove NMI context check
> > > > kprobes: Free kretprobe_instance with rcu callback
> > > > kprobes: Make local used functions static
> > > >
> > > > Peter Zijlstra (5):
> > > > llist: Add nonatomic __llist_add() and __llist_dell_all()
> > > > kprobes: Remove kretprobe hash
> > > > asm-generic/atomic: Add try_cmpxchg() fallbacks
> > > > freelist: Lock less freelist
> > > > kprobes: Replace rp->free_instance with freelist
> > >
> > > This looks good to me, do you want me to merge them through -tip? If so,
> > > do we want to try and get them in this release still?
> >
> > Yes, thanks. For the kretprobe missing issue, we will need the first half
> > (up to "kprobes: Remove NMI context check"), so we can split the series
> > if someone think the lockless is still immature.
>
> Ok, but then lockdep will yell at you if you have that enabled and run
> the unoptimized things.

Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
Hmm, it has to be noted in the documentation.

Thank you,


--
Masami Hiramatsu <[email protected]>

2020-09-02 09:38:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:

> > Ok, but then lockdep will yell at you if you have that enabled and run
> > the unoptimized things.
>
> Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> Hmm, it has to be noted in the documentation.

Lockdep will warn about spinlocks used in NMI context that are also used
outside NMI context.

Now, for the kretprobe that kprobe_busy flag prevents the actual
recursion self-deadlock, but lockdep isn't smart enough to see that.

One way around this might be to use SINGLE_DEPTH_NESTING for locks when
we use them from INT3 context. That way they'll have a different class
and lockdep will not see the recursion.

pre_handler_kretprobe() is always called from INT3, right?

Something like the below might then work...

---
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263c9cb9..b78f4fe08e86 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1255,11 +1255,11 @@ __acquires(hlist_lock)
NOKPROBE_SYMBOL(kretprobe_hash_lock);

static void kretprobe_table_lock(unsigned long hash,
- unsigned long *flags)
+ unsigned long *flags, int nesting)
__acquires(hlist_lock)
{
raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, nesting);
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

@@ -1326,7 +1326,7 @@ void kprobe_flush_task(struct task_struct *tk)
INIT_HLIST_HEAD(&empty_rp);
hash = hash_ptr(tk, KPROBE_HASH_BITS);
head = &kretprobe_inst_table[hash];
- kretprobe_table_lock(hash, &flags);
+ kretprobe_table_lock(hash, &flags, 0);
hlist_for_each_entry_safe(ri, tmp, head, hlist) {
if (ri->task == tk)
recycle_rp_inst(ri, &empty_rp);
@@ -1361,7 +1361,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)

/* No race here */
for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
- kretprobe_table_lock(hash, &flags);
+ kretprobe_table_lock(hash, &flags, 0);
head = &kretprobe_inst_table[hash];
hlist_for_each_entry_safe(ri, next, head, hlist) {
if (ri->rp == rp)
@@ -1950,7 +1950,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)

/* TODO: consider to only swap the RA after the last pre_handler fired */
hash = hash_ptr(current, KPROBE_HASH_BITS);
- raw_spin_lock_irqsave(&rp->lock, flags);
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
if (!hlist_empty(&rp->free_instances)) {
ri = hlist_entry(rp->free_instances.first,
struct kretprobe_instance, hlist);
@@ -1961,7 +1961,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
ri->task = current;

if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- raw_spin_lock_irqsave(&rp->lock, flags);
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock_irqrestore(&rp->lock, flags);
return 0;
@@ -1971,7 +1971,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)

/* XXX(hch): why is there no hlist_move_head? */
INIT_HLIST_NODE(&ri->hlist);
- kretprobe_table_lock(hash, &flags);
+ kretprobe_table_lock(hash, &flags, SINGLE_DEPTH_NESTING);
hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
kretprobe_table_unlock(hash, &flags);
} else {

2020-09-02 13:51:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, Sep 02, 2020 at 10:19:26PM +0900, Masami Hiramatsu wrote:
> On Wed, 2 Sep 2020 11:36:13 +0200
> [email protected] wrote:
>
> > On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:
> >
> > > > Ok, but then lockdep will yell at you if you have that enabled and run
> > > > the unoptimized things.
> > >
> > > Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> > > Hmm, it has to be noted in the documentation.
> >
> > Lockdep will warn about spinlocks used in NMI context that are also used
> > outside NMI context.
>
> OK, but raw_spin_lock_irqsave() will not involve lockdep, correct?

It will. The distinction between spin_lock and raw_spin_lock is only
that raw_spin_lock stays a spinlock on PREEMPT_RT, while spin_lock will
turn into a (PI) mutex in that case.

But both will call into lockdep. Unlike local_irq_disable() and
raw_local_irq_disable(), where the latter will not. Yes your prefixes
are a mess :/

> > Now, for the kretprobe that kprobe_busy flag prevents the actual
> > recursion self-deadlock, but lockdep isn't smart enough to see that.
> >
> > One way around this might be to use SINGLE_DEPTH_NESTING for locks when
> > we use them from INT3 context. That way they'll have a different class
> > and lockdep will not see the recursion.
>
> Hmm, so lockdep warns only when it detects the spinlock in NMI context,
> and int3 is now always NMI, thus all spinlock (except raw_spinlock?)
> in kprobe handlers should get warned, right?
> I have tested this series up to [16/21] with optprobe disabled, but
> I haven't see the lockdep warnings.

There's a bug, that might make it miss it. I have a patch. I'll send it
shortly.

> > pre_handler_kretprobe() is always called from INT3, right?
>
> No, not always, it can be called from optprobe (same as original code
> context) or ftrace handler.
> But if you set 0 to /proc/sys/debug/kprobe_optimization, and compile
> the kernel without function tracer, it should always be called from
> INT3.

D'oh, ofcourse! Arguably I should make the optprobe context NMI like
too.. but that's for another day.

2020-09-02 15:49:52

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, 2 Sep 2020 11:36:13 +0200
[email protected] wrote:

> On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:
>
> > > Ok, but then lockdep will yell at you if you have that enabled and run
> > > the unoptimized things.
> >
> > Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> > Hmm, it has to be noted in the documentation.
>
> Lockdep will warn about spinlocks used in NMI context that are also used
> outside NMI context.

OK, but raw_spin_lock_irqsave() will not involve lockdep, correct?

> Now, for the kretprobe that kprobe_busy flag prevents the actual
> recursion self-deadlock, but lockdep isn't smart enough to see that.
>
> One way around this might be to use SINGLE_DEPTH_NESTING for locks when
> we use them from INT3 context. That way they'll have a different class
> and lockdep will not see the recursion.

Hmm, so lockdep warns only when it detects the spinlock in NMI context,
and int3 is now always NMI, thus all spinlock (except raw_spinlock?)
in kprobe handlers should get warned, right?
I have tested this series up to [16/21] with optprobe disabled, but
I haven't see the lockdep warnings.

>
> pre_handler_kretprobe() is always called from INT3, right?

No, not always, it can be called from optprobe (same as original code
context) or ftrace handler.
But if you set 0 to /proc/sys/debug/kprobe_optimization, and compile
the kernel without function tracer, it should always be called from
INT3.

>
> Something like the below might then work...

OK, but I would like to reproduce the lockdep warning on this for
confirmation.

Thank you,

>
> ---
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 287b263c9cb9..b78f4fe08e86 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1255,11 +1255,11 @@ __acquires(hlist_lock)
> NOKPROBE_SYMBOL(kretprobe_hash_lock);
>
> static void kretprobe_table_lock(unsigned long hash,
> - unsigned long *flags)
> + unsigned long *flags, int nesting)
> __acquires(hlist_lock)
> {
> raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, nesting);
> }
> NOKPROBE_SYMBOL(kretprobe_table_lock);
>
> @@ -1326,7 +1326,7 @@ void kprobe_flush_task(struct task_struct *tk)
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> - kretprobe_table_lock(hash, &flags);
> + kretprobe_table_lock(hash, &flags, 0);
> hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> if (ri->task == tk)
> recycle_rp_inst(ri, &empty_rp);
> @@ -1361,7 +1361,7 @@ static void cleanup_rp_inst(struct kretprobe *rp)
>
> /* No race here */
> for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
> - kretprobe_table_lock(hash, &flags);
> + kretprobe_table_lock(hash, &flags, 0);
> head = &kretprobe_inst_table[hash];
> hlist_for_each_entry_safe(ri, next, head, hlist) {
> if (ri->rp == rp)
> @@ -1950,7 +1950,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>
> /* TODO: consider to only swap the RA after the last pre_handler fired */
> hash = hash_ptr(current, KPROBE_HASH_BITS);
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
> if (!hlist_empty(&rp->free_instances)) {
> ri = hlist_entry(rp->free_instances.first,
> struct kretprobe_instance, hlist);
> @@ -1961,7 +1961,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> ri->task = current;
>
> if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, SINGLE_DEPTH_NESTING);
> hlist_add_head(&ri->hlist, &rp->free_instances);
> raw_spin_unlock_irqrestore(&rp->lock, flags);
> return 0;
> @@ -1971,7 +1971,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>
> /* XXX(hch): why is there no hlist_move_head? */
> INIT_HLIST_NODE(&ri->hlist);
> - kretprobe_table_lock(hash, &flags);
> + kretprobe_table_lock(hash, &flags, SINGLE_DEPTH_NESTING);
> hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
> kretprobe_table_unlock(hash, &flags);
> } else {


--
Masami Hiramatsu <[email protected]>

2020-09-03 01:41:35

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, 2 Sep 2020 15:42:52 +0200
[email protected] wrote:

> On Wed, Sep 02, 2020 at 10:19:26PM +0900, Masami Hiramatsu wrote:
> > On Wed, 2 Sep 2020 11:36:13 +0200
> > [email protected] wrote:
> >
> > > On Wed, Sep 02, 2020 at 05:17:55PM +0900, Masami Hiramatsu wrote:
> > >
> > > > > Ok, but then lockdep will yell at you if you have that enabled and run
> > > > > the unoptimized things.
> > > >
> > > > Oh, does it warn for all spinlock things in kprobes if it is unoptimized?
> > > > Hmm, it has to be noted in the documentation.
> > >
> > > Lockdep will warn about spinlocks used in NMI context that are also used
> > > outside NMI context.
> >
> > OK, but raw_spin_lock_irqsave() will not involve lockdep, correct?
>
> It will. The distinction between spin_lock and raw_spin_lock is only
> that raw_spin_lock stays a spinlock on PREEMPT_RT, while spin_lock will
> turn into a (PI) mutex in that case.
>
> But both will call into lockdep. Unlike local_irq_disable() and
> raw_local_irq_disable(), where the latter will not. Yes your prefixes
> are a mess :/

Yeah, that's really confusing...

> > > Now, for the kretprobe that kprobe_busy flag prevents the actual
> > > recursion self-deadlock, but lockdep isn't smart enough to see that.
> > >
> > > One way around this might be to use SINGLE_DEPTH_NESTING for locks when
> > > we use them from INT3 context. That way they'll have a different class
> > > and lockdep will not see the recursion.
> >
> > Hmm, so lockdep warns only when it detects the spinlock in NMI context,
> > and int3 is now always NMI, thus all spinlock (except raw_spinlock?)
> > in kprobe handlers should get warned, right?
> > I have tested this series up to [16/21] with optprobe disabled, but
> > I haven't see the lockdep warnings.
>
> There's a bug, that might make it miss it. I have a patch. I'll send it
> shortly.

OK, I've confirmed that the lockdep warns on kretprobe from INT3
with your fix. Of course make it lockless then warning is gone.
But even without the lockless patch, this warning can be false-positive
because we prohibit nested kprobe call, right?

If the kprobe user handler uses a spinlock, the spinlock is used
only in that handler (and in the context between kprobe_busy_begin/end),
it will be safe since the spinlock is not nested.
But if the spinlock is shared with other context, it will be dangerous
because it can be interrupted by NMI (including INT3). This also applied
to the function which is called from kprobe user handlers, thus user
has to take care of it.
BTW, what would you think about setting NMI count between kprobe_busy_begin/end?

>
> > > pre_handler_kretprobe() is always called from INT3, right?
> >
> > No, not always, it can be called from optprobe (same as original code
> > context) or ftrace handler.
> > But if you set 0 to /proc/sys/debug/kprobe_optimization, and compile
> > the kernel without function tracer, it should always be called from
> > INT3.
>
> D'oh, ofcourse! Arguably I should make the optprobe context NMI like
> too.. but that's for another day.

Hmm, we still have kprobe-on-ftrace. Would you consider we will
make it NMI like too? (and what the ftrace does for this)

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-09-03 02:06:28

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Thu, 3 Sep 2020 10:39:54 +0900
Masami Hiramatsu <[email protected]> wrote:

> OK, I've confirmed that the lockdep warns on kretprobe from INT3
> with your fix. Of course make it lockless then warning is gone.
> But even without the lockless patch, this warning can be false-positive
> because we prohibit nested kprobe call, right?
>
> If the kprobe user handler uses a spinlock, the spinlock is used
> only in that handler (and in the context between kprobe_busy_begin/end),
> it will be safe since the spinlock is not nested.
> But if the spinlock is shared with other context, it will be dangerous
> because it can be interrupted by NMI (including INT3). This also applied
> to the function which is called from kprobe user handlers, thus user
> has to take care of it.

Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
care of spinlock too?

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-09-07 17:48:01

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

Masami Hiramatsu <[email protected]> writes:

> Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
> care of spinlock too?

On PRREMPT_RT configurations, systemtap uses the raw_spinlock_t
types/functions, to keep its probe handlers as atomic as we can make them.

- FChE

2020-09-08 02:57:17

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Mon, 07 Sep 2020 13:44:19 -0400
[email protected] (Frank Ch. Eigler) wrote:

> Masami Hiramatsu <[email protected]> writes:
>
> > Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
> > care of spinlock too?
>
> On PRREMPT_RT configurations, systemtap uses the raw_spinlock_t
> types/functions, to keep its probe handlers as atomic as we can make them.

OK, if the lock is only used in the probe handlers, there should be
no problem. Even if a probe hits in the NMI which happens in another
kprobe handler, the probe does not call its handler (because we don't
support nested kprobes* yet).
But maybe you'll get warnings if you enable the lockdep.

* https://lkml.kernel.org/r/158894789510.14896.13461271606820304664.stgit@devnote2
It seems that we need more work for the nested kprobes.

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-09-08 10:41:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Thu, Sep 03, 2020 at 10:39:54AM +0900, Masami Hiramatsu wrote:

> > There's a bug, that might make it miss it. I have a patch. I'll send it
> > shortly.
>
> OK, I've confirmed that the lockdep warns on kretprobe from INT3
> with your fix.

I'm now trying and failing to reproduce.... I can't seem to make it use
int3 today. It seems to want to use ftrace or refuses everything. I'm
probably doing it wrong.

Could you verify the below actually works? It's on top of the first 16
patches.

> Of course make it lockless then warning is gone.
> But even without the lockless patch, this warning can be false-positive
> because we prohibit nested kprobe call, right?

Yes, because the actual nesting is avoided by kprobe_busy, but lockdep
can't tell. Lockdep sees a regular lock user and an in-nmi lock user and
figures that's a bad combination.


---
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1241,48 +1241,47 @@ void recycle_rp_inst(struct kretprobe_in
}
NOKPROBE_SYMBOL(recycle_rp_inst);

-void kretprobe_hash_lock(struct task_struct *tsk,
- struct hlist_head **head, unsigned long *flags)
-__acquires(hlist_lock)
-{
- unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
- raw_spinlock_t *hlist_lock;
-
- *head = &kretprobe_inst_table[hash];
- hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_lock);
-
static void kretprobe_table_lock(unsigned long hash,
unsigned long *flags)
__acquires(hlist_lock)
{
raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * HACK, due to kprobe_busy we'll never actually recurse, make NMI
+ * context use a different lock class to avoid it reporting.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

-void kretprobe_hash_unlock(struct task_struct *tsk,
- unsigned long *flags)
+static void kretprobe_table_unlock(unsigned long hash,
+ unsigned long *flags)
__releases(hlist_lock)
{
+ raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
+ raw_spin_unlock_irqrestore(hlist_lock, *flags);
+}
+NOKPROBE_SYMBOL(kretprobe_table_unlock);
+
+void kretprobe_hash_lock(struct task_struct *tsk,
+ struct hlist_head **head, unsigned long *flags)
+__acquires(hlist_lock)
+{
unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
- raw_spinlock_t *hlist_lock;

- hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_unlock_irqrestore(hlist_lock, *flags);
+ *head = &kretprobe_inst_table[hash];
+ kretprobe_table_lock(hash, flags);
}
-NOKPROBE_SYMBOL(kretprobe_hash_unlock);
+NOKPROBE_SYMBOL(kretprobe_hash_lock);

-static void kretprobe_table_unlock(unsigned long hash,
- unsigned long *flags)
+void kretprobe_hash_unlock(struct task_struct *tsk,
+ unsigned long *flags)
__releases(hlist_lock)
{
- raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_unlock_irqrestore(hlist_lock, *flags);
+ unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
+ kretprobe_table_unlock(hash, flags);
}
-NOKPROBE_SYMBOL(kretprobe_table_unlock);
+NOKPROBE_SYMBOL(kretprobe_hash_unlock);

struct kprobe kprobe_busy = {
.addr = (void *) get_kprobe,

2020-09-08 11:24:47

by [email protected]

[permalink] [raw]
Subject: RE: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

> From: [email protected] <[email protected]>
>
> I'm now trying and failing to reproduce.... I can't seem to make it use
> int3 today. It seems to want to use ftrace or refuses everything. I'm
> probably doing it wrong.
>

You can turn off CONFIG_KPROBES_ON_FTRACE (and also sysctl debug.kprobes-optimization) to make it use int3 handler


TREND MICRO EMAIL NOTICE

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

For details about what personal information we collect and why, please see our Privacy Notice on our website at: Read privacy policy<http://www.trendmicro.com/privacy>

2020-09-08 11:52:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Tue, Sep 08, 2020 at 11:15:14AM +0000, [email protected] wrote:
> > From: [email protected] <[email protected]>
> >
> > I'm now trying and failing to reproduce.... I can't seem to make it use
> > int3 today. It seems to want to use ftrace or refuses everything. I'm
> > probably doing it wrong.
> >
>
> You can turn off CONFIG_KPROBES_ON_FTRACE (and also sysctl
> debug.kprobes-optimization) to make it use int3 handler

I'm fairly sure I used the sysctl like in your original reproducer.

I'll try again later.

2020-09-08 20:16:07

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Tue, 8 Sep 2020 12:37:36 +0200
[email protected] wrote:

> On Thu, Sep 03, 2020 at 10:39:54AM +0900, Masami Hiramatsu wrote:
>
> > > There's a bug, that might make it miss it. I have a patch. I'll send it
> > > shortly.
> >
> > OK, I've confirmed that the lockdep warns on kretprobe from INT3
> > with your fix.
>
> I'm now trying and failing to reproduce.... I can't seem to make it use
> int3 today. It seems to want to use ftrace or refuses everything. I'm
> probably doing it wrong.

Ah, yes. For using the INT3, we need to disable CONFIG_FUNCTION_TRACER,
or enable CONFIG_KPROBE_EVENTS_ON_NOTRACE and put a kretprobe on a notrace
function :)

>
> Could you verify the below actually works? It's on top of the first 16
> patches.

Sure. (BTW, you mean the first 15 patches, since kretprobe_hash_lock()
becomes static by 16th patch ?)

Here is the result. I got same warning with or without your patch.
I have built the kernel with CONFIG_FUNCTION_TRACER=n and CONFIG_KPROBES_SANITY_TEST=y.

[ 0.269235] PCI: Using configuration type 1 for base access
[ 0.272171] Kprobe smoke test: started
[ 0.281125]
[ 0.281126] ================================
[ 0.281126] WARNING: inconsistent lock state
[ 0.281126] 5.9.0-rc2+ #101 Not tainted
[ 0.281126] --------------------------------
[ 0.281127] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[ 0.281127] swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 0.281127] ffffffff8228c778 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x2b/0x1b0
[ 0.281128] {INITIAL USE} state was registered at:
[ 0.281129] lock_acquire+0x9e/0x3c0
[ 0.281129] _raw_spin_lock+0x2f/0x40
[ 0.281129] recycle_rp_inst+0x48/0x90
[ 0.281129] __kretprobe_trampoline_handler+0x15d/0x1e0
[ 0.281130] trampoline_handler+0x43/0x60
[ 0.281130] kretprobe_trampoline+0x2a/0x50
[ 0.281130] kretprobe_trampoline+0x0/0x50
[ 0.281130] init_kprobes+0x1b6/0x1d5
[ 0.281130] do_one_initcall+0x5c/0x315
[ 0.281131] kernel_init_freeable+0x21a/0x25f
[ 0.281131] kernel_init+0x9/0x104
[ 0.281131] ret_from_fork+0x22/0x30
[ 0.281131] irq event stamp: 25978
[ 0.281132] hardirqs last enabled at (25977): [<ffffffff8108a0f7>] queue_delayed_work_on+0x57/0x90
[ 0.281132] hardirqs last disabled at (25978): [<ffffffff818df778>] exc_int3+0x48/0x140
[ 0.281132] softirqs last enabled at (25964): [<ffffffff81c00389>] __do_softirq+0x389/0x48b
[ 0.281133] softirqs last disabled at (25957): [<ffffffff81a00f42>] asm_call_on_stack+0x12/0x20
[ 0.281133]
[ 0.281133] other info that might help us debug this:
[ 0.281133] Possible unsafe locking scenario:
[ 0.281134]
[ 0.281134] CPU0
[ 0.281134] ----
[ 0.281134] lock(&rp->lock);
[ 0.281135] <Interrupt>
[ 0.281135] lock(&rp->lock);
[ 0.281136]
[ 0.281136] *** DEADLOCK ***
[ 0.281136]
[ 0.281136] no locks held by swapper/0/1.
[ 0.281136]
[ 0.281137] stack backtrace:
[ 0.281137] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc2+ #101
[ 0.281137] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[ 0.281137] Call Trace:
[ 0.281138] dump_stack+0x81/0xba
[ 0.281138] print_usage_bug.cold+0x195/0x19e
[ 0.281138] lock_acquire+0x314/0x3c0
[ 0.281138] ? __text_poke+0x2db/0x400
[ 0.281139] ? pre_handler_kretprobe+0x2b/0x1b0
[ 0.281139] _raw_spin_lock_irqsave+0x3a/0x50
[ 0.281139] ? pre_handler_kretprobe+0x2b/0x1b0
[ 0.281139] pre_handler_kretprobe+0x2b/0x1b0
[ 0.281139] ? stop_machine_from_inactive_cpu+0x120/0x120
[ 0.281140] aggr_pre_handler+0x5f/0xd0
[ 0.281140] kprobe_int3_handler+0x10f/0x160
[ 0.281140] exc_int3+0x52/0x140
[ 0.281140] asm_exc_int3+0x31/0x40
[ 0.281141] RIP: 0010:kprobe_target+0x1/0x10
[ 0.281141] Code: 65 48 33 04 25 28 00 00 00 75 10 48 81 c4 90 00 00 00 44 89 e0 41 5c 5d c3 0f 0b e8 a
[ 0.281141] RSP: 0000:ffffc90000013e48 EFLAGS: 00000246
[ 0.281142] RAX: ffffffff81142130 RBX: 0000000000000001 RCX: ffffc90000013cec
[ 0.281142] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 00000000f3eb0b20
[ 0.281142] RBP: ffffc90000013e68 R08: 0000000000000000 R09: 0000000000000001
[ 0.281143] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 0.281143] R13: ffffffff8224c2b0 R14: ffffffff8255cf60 R15: 0000000000000000
[ 0.281143] ? stop_machine_from_inactive_cpu+0x120/0x120
[ 0.281143] ? kprobe_target+0x1/0x10
[ 0.281144] ? stop_machine_from_inactive_cpu+0x120/0x120
[ 0.281144] ? init_test_probes.cold+0x2e3/0x391
[ 0.281144] init_kprobes+0x1b6/0x1d5
[ 0.281144] ? debugfs_kprobe_init+0xa9/0xa9
[ 0.281145] do_one_initcall+0x5c/0x315
[ 0.281145] ? rcu_read_lock_sched_held+0x4a/0x80
[ 0.281145] kernel_init_freeable+0x21a/0x25f
[ 0.281145] ? rest_init+0x23c/0x23c
[ 0.281145] kernel_init+0x9/0x104
[ 0.281146] ret_from_fork+0x22/0x30
[ 0.281282] Kprobe smoke test: passed successfully


>
> > Of course make it lockless then warning is gone.
> > But even without the lockless patch, this warning can be false-positive
> > because we prohibit nested kprobe call, right?
>
> Yes, because the actual nesting is avoided by kprobe_busy, but lockdep
> can't tell. Lockdep sees a regular lock user and an in-nmi lock user and
> figures that's a bad combination.

Thanks for confirmation!

>
>
> ---
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1241,48 +1241,47 @@ void recycle_rp_inst(struct kretprobe_in
> }
> NOKPROBE_SYMBOL(recycle_rp_inst);
>
> -void kretprobe_hash_lock(struct task_struct *tsk,
> - struct hlist_head **head, unsigned long *flags)
> -__acquires(hlist_lock)
> -{
> - unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> - raw_spinlock_t *hlist_lock;
> -
> - *head = &kretprobe_inst_table[hash];
> - hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> -}
> -NOKPROBE_SYMBOL(kretprobe_hash_lock);
> -
> static void kretprobe_table_lock(unsigned long hash,
> unsigned long *flags)
> __acquires(hlist_lock)
> {
> raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * HACK, due to kprobe_busy we'll never actually recurse, make NMI
> + * context use a different lock class to avoid it reporting.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
> }
> NOKPROBE_SYMBOL(kretprobe_table_lock);
>
> -void kretprobe_hash_unlock(struct task_struct *tsk,
> - unsigned long *flags)
> +static void kretprobe_table_unlock(unsigned long hash,
> + unsigned long *flags)
> __releases(hlist_lock)
> {
> + raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> + raw_spin_unlock_irqrestore(hlist_lock, *flags);
> +}
> +NOKPROBE_SYMBOL(kretprobe_table_unlock);
> +
> +void kretprobe_hash_lock(struct task_struct *tsk,
> + struct hlist_head **head, unsigned long *flags)
> +__acquires(hlist_lock)
> +{
> unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> - raw_spinlock_t *hlist_lock;
>
> - hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_unlock_irqrestore(hlist_lock, *flags);
> + *head = &kretprobe_inst_table[hash];
> + kretprobe_table_lock(hash, flags);
> }
> -NOKPROBE_SYMBOL(kretprobe_hash_unlock);
> +NOKPROBE_SYMBOL(kretprobe_hash_lock);
>
> -static void kretprobe_table_unlock(unsigned long hash,
> - unsigned long *flags)
> +void kretprobe_hash_unlock(struct task_struct *tsk,
> + unsigned long *flags)
> __releases(hlist_lock)
> {
> - raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_unlock_irqrestore(hlist_lock, *flags);
> + unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
> + kretprobe_table_unlock(hash, flags);
> }
> -NOKPROBE_SYMBOL(kretprobe_table_unlock);
> +NOKPROBE_SYMBOL(kretprobe_hash_unlock);
>
> struct kprobe kprobe_busy = {
> .addr = (void *) get_kprobe,


--
Masami Hiramatsu <[email protected]>

2020-09-09 05:30:33

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

On Wed, 9 Sep 2020 00:09:23 +0900
Masami Hiramatsu <[email protected]> wrote:

> > > Of course make it lockless then warning is gone.
> > > But even without the lockless patch, this warning can be false-positive
> > > because we prohibit nested kprobe call, right?
> >
> > Yes, because the actual nesting is avoided by kprobe_busy, but lockdep
> > can't tell. Lockdep sees a regular lock user and an in-nmi lock user and
> > figures that's a bad combination.

Hmm, what about introducing new LOCK_USED_KPROBE bit, which will be set
if the lock is accessed when the current_kprobe is set (including kprobe_busy)?
This means it is in the kprobe user-handler context. If we access the lock always
in the kprobes context, it is never nested.

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-09-11 02:33:32

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

Hi Peter and Ingo,

On Wed, 2 Sep 2020 09:02:26 +0200
[email protected] wrote:

> On Wed, Sep 02, 2020 at 09:37:39AM +0900, Masami Hiramatsu wrote:
> > On Tue, 1 Sep 2020 21:08:08 +0200
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > On Sat, Aug 29, 2020 at 09:59:49PM +0900, Masami Hiramatsu wrote:
> > > > Masami Hiramatsu (16):
> > > > kprobes: Add generic kretprobe trampoline handler
> > > > x86/kprobes: Use generic kretprobe trampoline handler
> > > > arm: kprobes: Use generic kretprobe trampoline handler
> > > > arm64: kprobes: Use generic kretprobe trampoline handler
> > > > arc: kprobes: Use generic kretprobe trampoline handler
> > > > csky: kprobes: Use generic kretprobe trampoline handler
> > > > ia64: kprobes: Use generic kretprobe trampoline handler
> > > > mips: kprobes: Use generic kretprobe trampoline handler
> > > > parisc: kprobes: Use generic kretprobe trampoline handler
> > > > powerpc: kprobes: Use generic kretprobe trampoline handler
> > > > s390: kprobes: Use generic kretprobe trampoline handler
> > > > sh: kprobes: Use generic kretprobe trampoline handler
> > > > sparc: kprobes: Use generic kretprobe trampoline handler
> > > > kprobes: Remove NMI context check
> > > > kprobes: Free kretprobe_instance with rcu callback
> > > > kprobes: Make local used functions static
> > > >
> > > > Peter Zijlstra (5):
> > > > llist: Add nonatomic __llist_add() and __llist_dell_all()
> > > > kprobes: Remove kretprobe hash
> > > > asm-generic/atomic: Add try_cmpxchg() fallbacks
> > > > freelist: Lock less freelist
> > > > kprobes: Replace rp->free_instance with freelist
> > >
> > > This looks good to me, do you want me to merge them through -tip? If so,
> > > do we want to try and get them in this release still?
> >
> > Yes, thanks. For the kretprobe missing issue, we will need the first half
> > (up to "kprobes: Remove NMI context check"), so we can split the series
> > if someone think the lockless is still immature.
>
> Ok, but then lockdep will yell at you if you have that enabled and run
> the unoptimized things.
>
> > > Ingo, opinions? This basically fixes a regression cauesd by
> > >
> > > 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

So what would you think of this? I saw the unification part of this series
on the tip/master, but lockless part is not there. This might still keep
lockdep to warn on kretprobes if we disable CONFIG_FUNCTION_TRACER and
optprobe.

If we make the kretprobe lockless, we will remove all locks from in-kernel
kprobe handlers. So at least upstream user will be happy.

Or, do we fix lockdep warning on the spinlocks in kprobe handlers first?

Thank you,

--
Masami Hiramatsu <[email protected]>

Subject: [tip: perf/kprobes] sparc: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 5e96ce8ae5b1428e6f4953be5fb1daf0a6d18426
Gitweb: https://git.kernel.org/tip/5e96ce8ae5b1428e6f4953be5fb1daf0a6d18426
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:02:25 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:35 +02:00

sparc: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870614572.1229682.2273450776108579676.stgit@devnote2
---
arch/sparc/kernel/kprobes.c | 51 ++----------------------------------
1 file changed, 3 insertions(+), 48 deletions(-)

diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index dfbca24..217c21a 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -453,6 +453,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)(regs->u_regs[UREG_RETPC] + 8);
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->u_regs[UREG_RETPC] =
@@ -465,58 +466,12 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
+ unsigned long orig_ret_address = 0;

- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
regs->tpc = orig_ret_address;
regs->tnpc = orig_ret_address + 4;

- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler

Subject: [tip: perf/kprobes] sh: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 0cf0e2fe91fade050bfda2a7240550ba6106b760
Gitweb: https://git.kernel.org/tip/0cf0e2fe91fade050bfda2a7240550ba6106b760
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:02:15 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:34 +02:00

sh: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870613547.1229682.15519965962108261812.stgit@devnote2
---
arch/sh/kernel/kprobes.c | 58 ++-------------------------------------
1 file changed, 3 insertions(+), 55 deletions(-)

diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 318296f..756100b 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -204,6 +204,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *) regs->pr;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->pr = (unsigned long)kretprobe_trampoline;
@@ -302,62 +303,9 @@ static void __used kretprobe_trampoline_holder(void)
*/
int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
+ regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);

- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more then one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- regs->pc = orig_ret_address;
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
- return orig_ret_address;
+ return 1;
}

static int __kprobes post_kprobe_handler(struct pt_regs *regs)

Subject: [tip: perf/kprobes] kprobes: Free kretprobe_instance with RCU callback

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: b338817807538c893540e393856b79cbbdf777ea
Gitweb: https://git.kernel.org/tip/b338817807538c893540e393856b79cbbdf777ea
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:02:47 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:35 +02:00

kprobes: Free kretprobe_instance with RCU callback

Free kretprobe_instance with RCU callback instead of directly
freeing the object in the kretprobe handler context.

This will make kretprobe run safer in NMI context.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870616685.1229682.11978742048709542226.stgit@devnote2
---
include/linux/kprobes.h | 6 ++++--
kernel/kprobes.c | 25 ++++++-------------------
2 files changed, 10 insertions(+), 21 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 72142ae..3389067 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -156,7 +156,10 @@ struct kretprobe {
};

struct kretprobe_instance {
- struct hlist_node hlist;
+ union {
+ struct hlist_node hlist;
+ struct rcu_head rcu;
+ };
struct kretprobe *rp;
kprobe_opcode_t *ret_addr;
struct task_struct *task;
@@ -395,7 +398,6 @@ int register_kretprobes(struct kretprobe **rps, int num);
void unregister_kretprobes(struct kretprobe **rps, int num);

void kprobe_flush_task(struct task_struct *tk);
-void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);

int disable_kprobe(struct kprobe *kp);
int enable_kprobe(struct kprobe *kp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2111382..0676868 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1223,8 +1223,7 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
}
NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);

-void recycle_rp_inst(struct kretprobe_instance *ri,
- struct hlist_head *head)
+static void recycle_rp_inst(struct kretprobe_instance *ri)
{
struct kretprobe *rp = ri->rp;

@@ -1236,8 +1235,7 @@ void recycle_rp_inst(struct kretprobe_instance *ri,
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock(&rp->lock);
} else
- /* Unregistering */
- hlist_add_head(&ri->hlist, head);
+ kfree_rcu(ri, rcu);
}
NOKPROBE_SYMBOL(recycle_rp_inst);

@@ -1313,7 +1311,7 @@ void kprobe_busy_end(void)
void kprobe_flush_task(struct task_struct *tk)
{
struct kretprobe_instance *ri;
- struct hlist_head *head, empty_rp;
+ struct hlist_head *head;
struct hlist_node *tmp;
unsigned long hash, flags = 0;

@@ -1323,19 +1321,14 @@ void kprobe_flush_task(struct task_struct *tk)

kprobe_busy_begin();

- INIT_HLIST_HEAD(&empty_rp);
hash = hash_ptr(tk, KPROBE_HASH_BITS);
head = &kretprobe_inst_table[hash];
kretprobe_table_lock(hash, &flags);
hlist_for_each_entry_safe(ri, tmp, head, hlist) {
if (ri->task == tk)
- recycle_rp_inst(ri, &empty_rp);
+ recycle_rp_inst(ri);
}
kretprobe_table_unlock(hash, &flags);
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }

kprobe_busy_end();
}
@@ -1936,13 +1929,12 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *frame_pointer)
{
struct kretprobe_instance *ri = NULL, *last = NULL;
- struct hlist_head *head, empty_rp;
+ struct hlist_head *head;
struct hlist_node *tmp;
unsigned long flags;
kprobe_opcode_t *correct_ret_addr = NULL;
bool skipped = false;

- INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);

/*
@@ -2011,7 +2003,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
__this_cpu_write(current_kprobe, prev);
}

- recycle_rp_inst(ri, &empty_rp);
+ recycle_rp_inst(ri);

if (ri == last)
break;
@@ -2019,11 +2011,6 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,

kretprobe_hash_unlock(current, &flags);

- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
return (unsigned long)correct_ret_addr;
}
NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)

Subject: [tip: perf/kprobes] ia64: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: e792ff804f49720ce003b3e4c618b5d996256a18
Gitweb: https://git.kernel.org/tip/e792ff804f49720ce003b3e4c618b5d996256a18
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:01:09 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:33 +02:00

ia64: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870606883.1229682.12331813108378725668.stgit@devnote2
---
arch/ia64/kernel/kprobes.c | 77 +-------------------------------------
1 file changed, 2 insertions(+), 75 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 7a7df94..fc1ff8a 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -396,83 +396,9 @@ static void kretprobe_trampoline(void)
{
}

-/*
- * At this point the target function has been tricked into
- * returning into our trampoline. Lookup the associated instance
- * and then:
- * - call the handler function
- * - cleanup by marking the instance as unused
- * - long jump back to the original return address
- */
int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- ((struct fnptr *)kretprobe_trampoline)->ip;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- regs->cr_iip = orig_ret_address;
-
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler
@@ -485,6 +411,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->b0;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;

Subject: [tip: perf/kprobes] arm: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 94509582d1d173c4777b8832ecbac6dff2ccbfa7
Gitweb: https://git.kernel.org/tip/94509582d1d173c4777b8832ecbac6dff2ccbfa7
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:00:24 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:32 +02:00

arm: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Use regs->ARM_fp
for framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870602406.1229682.10496730247473708592.stgit@devnote2
---
arch/arm/probes/kprobes/core.c | 78 +--------------------------------
1 file changed, 3 insertions(+), 75 deletions(-)

diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index feefa20..a965311 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -413,87 +413,15 @@ void __naked __kprobes kretprobe_trampoline(void)
/* Called from kretprobe_trampoline */
static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * a return probe installed on them, and/or more than one return
- * probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
+ (void *)regs->ARM_fp);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->ARM_lr;
+ ri->fp = (void *)regs->ARM_fp;

/* Replace the return addr with trampoline addr. */
regs->ARM_lr = (unsigned long)&kretprobe_trampoline;

Subject: [tip: perf/kprobes] powerpc: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: b6c5a58dd89e683e66b52a57a6f022fda79a0aab
Gitweb: https://git.kernel.org/tip/b6c5a58dd89e683e66b52a57a6f022fda79a0aab
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:01:48 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:34 +02:00

powerpc: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870610825.1229682.2090635992093223399.stgit@devnote2
---
arch/powerpc/kernel/kprobes.c | 53 +---------------------------------
1 file changed, 3 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 6ab9b4d..01ab216 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -218,6 +218,7 @@ bool arch_kprobe_on_func_entry(unsigned long offset)
void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->link;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->link = (unsigned long)kretprobe_trampoline;
@@ -396,50 +397,9 @@ asm(".global kretprobe_trampoline\n"
*/
static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
+ unsigned long orig_ret_address;

+ orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
/*
* We get here through one of two paths:
* 1. by taking a trap -> kprobe_handler() -> here
@@ -458,13 +418,6 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
regs->nip = orig_ret_address - 4;
regs->link = orig_ret_address;

- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
-
return 0;
}
NOKPROBE_SYMBOL(trampoline_probe_handler);

Subject: [tip: perf/kprobes] kprobes: Make local functions static

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 319f0ce284fff8e4f95167cb144acc905d0584c7
Gitweb: https://git.kernel.org/tip/319f0ce284fff8e4f95167cb144acc905d0584c7
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:03:02 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:42 +02:00

kprobes: Make local functions static

Since we unified the kretprobe trampoline handler from arch/* code,
some functions and objects do not need to be exported anymore.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870618256.1229682.8692046612635810882.stgit@devnote2
---
include/linux/kprobes.h | 15 ---------------
kernel/kprobes.c | 9 ++++-----
2 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 3389067..5c8c271 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -190,7 +190,6 @@ static inline int kprobes_built_in(void)
return 1;
}

-extern struct kprobe kprobe_busy;
extern void kprobe_busy_begin(void);
extern void kprobe_busy_end(void);

@@ -235,16 +234,6 @@ static inline int arch_trampoline_kprobe(struct kprobe *p)

extern struct kretprobe_blackpoint kretprobe_blacklist[];

-static inline void kretprobe_assert(struct kretprobe_instance *ri,
- unsigned long orig_ret_address, unsigned long trampoline_address)
-{
- if (!orig_ret_address || (orig_ret_address == trampoline_address)) {
- printk("kretprobe BUG!: Processing kretprobe %p @ %p\n",
- ri->rp, ri->rp->kp.addr);
- BUG();
- }
-}
-
#ifdef CONFIG_KPROBES_SANITY_TEST
extern int init_test_probes(void);
#else
@@ -364,10 +353,6 @@ int arch_check_ftrace_location(struct kprobe *p);

/* Get the kprobe at this addr (if any) - called with preemption disabled */
struct kprobe *get_kprobe(void *addr);
-void kretprobe_hash_lock(struct task_struct *tsk,
- struct hlist_head **head, unsigned long *flags);
-void kretprobe_hash_unlock(struct task_struct *tsk, unsigned long *flags);
-struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk);

/* kprobe_running() will just return the current_kprobe on this CPU */
static inline struct kprobe *kprobe_running(void)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0676868..732a701 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1239,7 +1239,7 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
}
NOKPROBE_SYMBOL(recycle_rp_inst);

-void kretprobe_hash_lock(struct task_struct *tsk,
+static void kretprobe_hash_lock(struct task_struct *tsk,
struct hlist_head **head, unsigned long *flags)
__acquires(hlist_lock)
{
@@ -1261,7 +1261,7 @@ __acquires(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

-void kretprobe_hash_unlock(struct task_struct *tsk,
+static void kretprobe_hash_unlock(struct task_struct *tsk,
unsigned long *flags)
__releases(hlist_lock)
{
@@ -1282,7 +1282,7 @@ __releases(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_unlock);

-struct kprobe kprobe_busy = {
+static struct kprobe kprobe_busy = {
.addr = (void *) get_kprobe,
};

@@ -1983,8 +1983,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
break;
}

- kretprobe_assert(ri, (unsigned long)correct_ret_addr,
- (unsigned long)trampoline_address);
+ BUG_ON(!correct_ret_addr || (correct_ret_addr == trampoline_address));
last = ri;

hlist_for_each_entry_safe(ri, tmp, head, hlist) {

Subject: [tip: perf/kprobes] kprobes: Add generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 66ada2ccae4ed4dd07ba91df3b5fdb4c11335bd1
Gitweb: https://git.kernel.org/tip/66ada2ccae4ed4dd07ba91df3b5fdb4c11335bd1
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:00:01 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:31 +02:00

kprobes: Add generic kretprobe trampoline handler

Add a generic kretprobe trampoline handler for unifying
the all cloned /arch/* kretprobe trampoline handlers.

The generic kretprobe trampoline handler is based on the
x86 implementation, because it is the latest implementation.
It has frame pointer checking, kprobe_busy_begin/end and
return address fixup for user handlers.

[ mingo: Minor edits. ]

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870600138.1229682.3424065380448088833.stgit@devnote2
---
include/linux/kprobes.h | 32 +++++++++++--
kernel/kprobes.c | 98 ++++++++++++++++++++++++++++++++++++++++-
2 files changed, 126 insertions(+), 4 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 9be1bff..72142ae 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -187,10 +187,38 @@ static inline int kprobes_built_in(void)
return 1;
}

+extern struct kprobe kprobe_busy;
+extern void kprobe_busy_begin(void);
+extern void kprobe_busy_end(void);
+
#ifdef CONFIG_KRETPROBES
extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);
+
+/* If the trampoline handler called from a kprobe, use this version */
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer);
+
+static nokprobe_inline
+unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer)
+{
+ unsigned long ret;
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never runs in kprobe handler, no kprobe must
+ * be running at this point.
+ */
+ kprobe_busy_begin();
+ ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);
+ kprobe_busy_end();
+
+ return ret;
+}
+
#else /* CONFIG_KRETPROBES */
static inline void arch_prepare_kretprobe(struct kretprobe *rp,
struct pt_regs *regs)
@@ -354,10 +382,6 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
return this_cpu_ptr(&kprobe_ctlblk);
}

-extern struct kprobe kprobe_busy;
-void kprobe_busy_begin(void);
-void kprobe_busy_end(void);
-
kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 287b263..a0afaa7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1927,6 +1927,104 @@ unsigned long __weak arch_deref_entry_point(void *entry)
}

#ifdef CONFIG_KRETPROBES
+
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer)
+{
+ struct kretprobe_instance *ri = NULL, *last = NULL;
+ struct hlist_head *head, empty_rp;
+ struct hlist_node *tmp;
+ unsigned long flags;
+ kprobe_opcode_t *correct_ret_addr = NULL;
+ bool skipped = false;
+
+ INIT_HLIST_HEAD(&empty_rp);
+ kretprobe_hash_lock(current, &head, &flags);
+
+ /*
+ * It is possible to have multiple instances associated with a given
+ * task either because multiple functions in the call path have
+ * return probes installed on them, and/or more than one
+ * return probe was registered for a target function.
+ *
+ * We can handle this because:
+ * - instances are always pushed into the head of the list
+ * - when multiple return probes are registered for the same
+ * function, the (chronologically) first instance's ret_addr
+ * will be the real return address, and all the rest will
+ * point to kretprobe_trampoline.
+ */
+ hlist_for_each_entry(ri, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+ /*
+ * Return probes must be pushed on this hash list correct
+ * order (same as return order) so that it can be popped
+ * correctly. However, if we find it is pushed it incorrect
+ * order, this means we find a function which should not be
+ * probed, because the wrong order entry is pushed on the
+ * path of processing other kretprobe itself.
+ */
+ if (ri->fp != frame_pointer) {
+ if (!skipped)
+ pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
+ skipped = true;
+ continue;
+ }
+
+ correct_ret_addr = ri->ret_addr;
+ if (skipped)
+ pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
+ ri->rp->kp.addr);
+
+ if (correct_ret_addr != trampoline_address)
+ /*
+ * This is the real return address. Any other
+ * instances associated with this task are for
+ * other calls deeper on the call stack
+ */
+ break;
+ }
+
+ kretprobe_assert(ri, (unsigned long)correct_ret_addr,
+ (unsigned long)trampoline_address);
+ last = ri;
+
+ hlist_for_each_entry_safe(ri, tmp, head, hlist) {
+ if (ri->task != current)
+ /* another task is sharing our hash bucket */
+ continue;
+ if (ri->fp != frame_pointer)
+ continue;
+
+ if (ri->rp && ri->rp->handler) {
+ struct kprobe *prev = kprobe_running();
+
+ __this_cpu_write(current_kprobe, &ri->rp->kp);
+ ri->ret_addr = correct_ret_addr;
+ ri->rp->handler(ri, regs);
+ __this_cpu_write(current_kprobe, prev);
+ }
+
+ recycle_rp_inst(ri, &empty_rp);
+
+ if (ri == last)
+ break;
+ }
+
+ kretprobe_hash_unlock(current, &flags);
+
+ hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
+ hlist_del(&ri->hlist);
+ kfree(ri);
+ }
+
+ return (unsigned long)correct_ret_addr;
+}
+NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
+
/*
* This kprobe pre_handler is registered with every kretprobe. When probe
* hits it will set up the return probe.

Subject: [tip: perf/kprobes] arm64: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 95a4b7a24f55efd4fbc77c020fe0aab3e1f37f8a
Gitweb: https://git.kernel.org/tip/95a4b7a24f55efd4fbc77c020fe0aab3e1f37f8a
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:00:35 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:32 +02:00

arm64: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler, and use
kernel_stack_pointer(regs) for framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870603544.1229682.10309733593594205725.stgit@devnote2
---
arch/arm64/kernel/probes/kprobes.c | 78 +----------------------------
1 file changed, 3 insertions(+), 75 deletions(-)

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 5290f17..deba738 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -464,87 +464,15 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
+ (void *)kernel_stack_pointer(regs));
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->regs[30];
+ ri->fp = (void *)kernel_stack_pointer(regs);

/* replace return addr (x30) with trampoline */
regs->regs[30] = (long)&kretprobe_trampoline;

Subject: [tip: perf/kprobes] s390: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 26a24a6b43d51c6fc70bef9b9016d81e4cf75e6e
Gitweb: https://git.kernel.org/tip/26a24a6b43d51c6fc70bef9b9016d81e4cf75e6e
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:02:04 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:34 +02:00

s390: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870612453.1229682.15950927742606892302.stgit@devnote2
---
arch/s390/kernel/kprobes.c | 79 +-------------------------------------
1 file changed, 2 insertions(+), 77 deletions(-)

diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index d2a71d8..fc30e79 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -228,6 +228,7 @@ NOKPROBE_SYMBOL(pop_kprobe);
void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *) regs->gprs[14];
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->gprs[14] = (unsigned long) &kretprobe_trampoline;
@@ -331,83 +332,7 @@ static void __used kretprobe_trampoline_holder(void)
*/
static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- struct kretprobe_instance *ri;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address;
- unsigned long trampoline_address;
- kprobe_opcode_t *correct_ret_addr;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- ri = NULL;
- orig_ret_address = 0;
- correct_ret_addr = NULL;
- trampoline_address = (unsigned long) &kretprobe_trampoline;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long) ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long) ri->ret_addr;
-
- if (ri->rp && ri->rp->handler) {
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- regs->psw.addr = orig_ret_address;
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler

Subject: [tip: perf/kprobes] x86/kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: d7641289dad95df3531f573112778c548331ab83
Gitweb: https://git.kernel.org/tip/d7641289dad95df3531f573112778c548331ab83
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:00:12 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:32 +02:00

x86/kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Use regs->sp
for framepointer verification.

[ mingo: Minor edits. ]

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870601250.1229682.14598707734683575237.stgit@devnote2
---
arch/x86/kernel/kprobes/core.c | 108 +--------------------------------
1 file changed, 3 insertions(+), 105 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index fdadc37..882b953 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -767,124 +767,22 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);

+
/*
* Called from kretprobe_trampoline
*/
__used __visible void *trampoline_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
- void *frame_pointer;
- bool skipped = false;
-
- /*
- * Set a dummy kprobe for avoiding kretprobe recursion.
- * Since kretprobe never run in kprobe handler, kprobe must not
- * be running at this point.
- */
- kprobe_busy_begin();
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
/* fixup registers */
regs->cs = __KERNEL_CS;
#ifdef CONFIG_X86_32
regs->cs |= get_kernel_rpl();
regs->gs = 0;
#endif
- /* We use pt_regs->sp for return address holder. */
- frame_pointer = &regs->sp;
- regs->ip = trampoline_address;
+ regs->ip = (unsigned long)&kretprobe_trampoline;
regs->orig_ax = ~0UL;

- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry(ri, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- /*
- * Return probes must be pushed on this hash list correct
- * order (same as return order) so that it can be popped
- * correctly. However, if we find it is pushed it incorrect
- * order, this means we find a function which should not be
- * probed, because the wrong order entry is pushed on the
- * path of processing other kretprobe itself.
- */
- if (ri->fp != frame_pointer) {
- if (!skipped)
- pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
- skipped = true;
- continue;
- }
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (skipped)
- pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
- ri->rp->kp.addr);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- if (ri->fp != frame_pointer)
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, &kprobe_busy);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- kprobe_busy_end();
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp);
}
NOKPROBE_SYMBOL(trampoline_handler);

Subject: [tip: perf/kprobes] mips: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 2ef12450856121f092621d4305885ec20283260e
Gitweb: https://git.kernel.org/tip/2ef12450856121f092621d4305885ec20283260e
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:01:19 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:33 +02:00

mips: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870607968.1229682.12100697467108845587.stgit@devnote2
---
arch/mips/kernel/kprobes.c | 54 ++-----------------------------------
1 file changed, 3 insertions(+), 51 deletions(-)

diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index d043c2f..54dfba8 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -477,6 +477,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *) regs->regs[31];
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->regs[31] = (unsigned long)kretprobe_trampoline;
@@ -488,57 +489,8 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- instruction_pointer(regs) = orig_ret_address;
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ instruction_pointer(regs) = __kretprobe_trampoline_handler(regs,
+ kretprobe_trampoline, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler

Subject: [tip: perf/kprobes] arc: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: f75dd136b65cccfaf3869be5380f401538fa9f72
Gitweb: https://git.kernel.org/tip/f75dd136b65cccfaf3869be5380f401538fa9f72
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:00:46 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:32 +02:00

arc: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870604671.1229682.13639386820521358456.stgit@devnote2
---
arch/arc/kernel/kprobes.c | 54 +-------------------------------------
1 file changed, 2 insertions(+), 52 deletions(-)

diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index 7d3efe8..cabef45 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -388,6 +388,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
{

ri->ret_addr = (kprobe_opcode_t *) regs->blink;
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr */
regs->blink = (unsigned long)&kretprobe_trampoline;
@@ -396,58 +397,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)&kretprobe_trampoline;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because an multiple functions in the call path
- * have a return probe installed on them, and/or more than one return
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- if (ri->rp && ri->rp->handler)
- ri->rp->handler(ri, regs);
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address) {
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
- regs->ret = orig_ret_address;
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);

/* By returning a non zero value, we are telling the kprobe handler
* that we don't want the post_handler to run

Subject: [tip: perf/kprobes] parisc: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 16ff6f7ac92e506a05d3720e4d9fa0647bb0b5c4
Gitweb: https://git.kernel.org/tip/16ff6f7ac92e506a05d3720e4d9fa0647bb0b5c4
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:01:37 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:33 +02:00

parisc: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870609708.1229682.1861714117180719169.stgit@devnote2
---
arch/parisc/kernel/kprobes.c | 76 +----------------------------------
1 file changed, 4 insertions(+), 72 deletions(-)

diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 77ec518..6d21a51 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -191,80 +191,11 @@ static struct kprobe trampoline_p = {
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address = (unsigned long)trampoline_p.addr;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * a return probe installed on them, and/or more than one return
- * probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always inserted at the head of the list
- * - when multiple return probes are registered for the same
- * function, the first instance's ret_addr will point to the
- * real return address, and all the rest will point to
- * kretprobe_trampoline
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
+ unsigned long orig_ret_address;

- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
+ orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL);
instruction_pointer_set(regs, orig_ret_address);
+
return 1;
}

@@ -272,6 +203,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->gr[2];
+ ri->fp = NULL;

/* Replace the return addr with trampoline addr. */
regs->gr[2] = (unsigned long)trampoline_p.addr;

Subject: [tip: perf/kprobes] csky: kprobes: Use generic kretprobe trampoline handler

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 03c8a4a447008c1a755cd32bc89af24119bd08f1
Gitweb: https://git.kernel.org/tip/03c8a4a447008c1a755cd32bc89af24119bd08f1
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:00:58 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:33 +02:00

csky: kprobes: Use generic kretprobe trampoline handler

Use the generic kretprobe trampoline handler. Don't use
framepointer verification.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Guo Ren <[email protected]>
Link: https://lore.kernel.org/r/159870605775.1229682.2627276871589951304.stgit@devnote2
---
arch/csky/kernel/probes/kprobes.c | 77 +------------------------------
1 file changed, 2 insertions(+), 75 deletions(-)

diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index f0f733b..589f090 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -404,87 +404,14 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head, empty_rp;
- struct hlist_node *tmp;
- unsigned long flags, orig_ret_address = 0;
- unsigned long trampoline_address =
- (unsigned long)&kretprobe_trampoline;
- kprobe_opcode_t *correct_ret_addr = NULL;
-
- INIT_HLIST_HEAD(&empty_rp);
- kretprobe_hash_lock(current, &head, &flags);
-
- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_assert(ri, orig_ret_address, trampoline_address);
-
- correct_ret_addr = ri->ret_addr;
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
-
- orig_ret_address = (unsigned long)ri->ret_addr;
- if (ri->rp && ri->rp->handler) {
- __this_cpu_write(current_kprobe, &ri->rp->kp);
- get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
- ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, NULL);
- }
-
- recycle_rp_inst(ri, &empty_rp);
-
- if (orig_ret_address != trampoline_address)
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- break;
- }
-
- kretprobe_hash_unlock(current, &flags);
-
- hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
- hlist_del(&ri->hlist);
- kfree(ri);
- }
- return (void *)orig_ret_address;
+ return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs)
{
ri->ret_addr = (kprobe_opcode_t *)regs->lr;
+ ri->fp = NULL;
regs->lr = (unsigned long) &kretprobe_trampoline;
}

Subject: [tip: perf/kprobes] kprobes: Remove NMI context check

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: e03b4a084ea6b0a18b0e874baec439e69090c168
Gitweb: https://git.kernel.org/tip/e03b4a084ea6b0a18b0e874baec439e69090c168
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:02:36 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 08 Sep 2020 11:52:35 +02:00

kprobes: Remove NMI context check

The in_nmi() check in pre_handler_kretprobe() is meant to avoid
recursion, and blindly assumes that anything NMI is recursive.

However, since commit:

9b38cc704e84 ("kretprobe: Prevent triggering kretprobe from within kprobe_flush_task")

there is a better way to detect and avoid actual recursion.

By setting a dummy kprobe, any actual exceptions will terminate early
(by trying to handle the dummy kprobe), and recursion will not happen.

Employ this to avoid the kretprobe_table_lock() recursion, replacing
the over-eager in_nmi() check.

Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/159870615628.1229682.6087311596892125907.stgit@devnote2
---
kernel/kprobes.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index a0afaa7..2111382 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1359,7 +1359,8 @@ static void cleanup_rp_inst(struct kretprobe *rp)
struct hlist_node *next;
struct hlist_head *head;

- /* No race here */
+ /* To avoid recursive kretprobe by NMI, set kprobe busy here */
+ kprobe_busy_begin();
for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
kretprobe_table_lock(hash, &flags);
head = &kretprobe_inst_table[hash];
@@ -1369,6 +1370,8 @@ static void cleanup_rp_inst(struct kretprobe *rp)
}
kretprobe_table_unlock(hash, &flags);
}
+ kprobe_busy_end();
+
free_rp_inst(rp);
}
NOKPROBE_SYMBOL(cleanup_rp_inst);
@@ -2035,17 +2038,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
unsigned long hash, flags = 0;
struct kretprobe_instance *ri;

- /*
- * To avoid deadlocks, prohibit return probing in NMI contexts,
- * just skip the probe and increase the (inexact) 'nmissed'
- * statistical counter, so that the user is informed that
- * something happened:
- */
- if (unlikely(in_nmi())) {
- rp->nmissed++;
- return 0;
- }
-
/* TODO: consider to only swap the RA after the last pre_handler fired */
hash = hash_ptr(current, KPROBE_HASH_BITS);
raw_spin_lock_irqsave(&rp->lock, flags);

2020-10-12 16:26:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()


* Masami Hiramatsu <[email protected]> wrote:

> From: Peter Zijlstra <[email protected]>
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

Because you are forwarding this patch here, I've added your SOB:

Signed-off-by: Masami Hiramatsu <[email protected]>

(Let me know if that's not OK.)

Thanks,

Ingo

2020-10-12 16:27:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks


* Masami Hiramatsu <[email protected]> wrote:

> From: Peter Zijlstra <[email protected]>
>
> Only x86 provides try_cmpxchg() outside of the atomic_t interfaces,
> provide generic fallbacks to create this interface from the widely
> available cmpxchg() function.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Acked-by: Will Deacon <[email protected]>

Your SOB was missing here too.

Thanks,

Ingo

Subject: [tip: perf/kprobes] kprobes: Replace rp->free_instance with freelist

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 6e426e0fcd20ce144bb93e00b70df51e9f2e08c3
Gitweb: https://git.kernel.org/tip/6e426e0fcd20ce144bb93e00b70df51e9f2e08c3
Author: Peter Zijlstra <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:03:56 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Mon, 12 Oct 2020 18:27:28 +02:00

kprobes: Replace rp->free_instance with freelist

Gets rid of rp->lock, and as a result kretprobes are now fully
lockless.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870623583.1229682.17472357584134058687.stgit@devnote2
---
include/linux/kprobes.h | 8 +++---
kernel/kprobes.c | 56 +++++++++++++++++-----------------------
2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 00cf442..b7824e3 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -28,6 +28,7 @@
#include <linux/mutex.h>
#include <linux/ftrace.h>
#include <linux/refcount.h>
+#include <linux/freelist.h>
#include <asm/kprobes.h>

#ifdef CONFIG_KPROBES
@@ -157,17 +158,16 @@ struct kretprobe {
int maxactive;
int nmissed;
size_t data_size;
- struct hlist_head free_instances;
+ struct freelist_head freelist;
struct kretprobe_holder *rph;
- raw_spinlock_t lock;
};

struct kretprobe_instance {
union {
- struct llist_node llist;
- struct hlist_node hlist;
+ struct freelist_node freelist;
struct rcu_head rcu;
};
+ struct llist_node llist;
struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
void *fp;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 850ee36..30b8fe7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1228,11 +1228,8 @@ static void recycle_rp_inst(struct kretprobe_instance *ri)
{
struct kretprobe *rp = get_kretprobe(ri);

- INIT_HLIST_NODE(&ri->hlist);
if (likely(rp)) {
- raw_spin_lock(&rp->lock);
- hlist_add_head(&ri->hlist, &rp->free_instances);
- raw_spin_unlock(&rp->lock);
+ freelist_add(&ri->freelist, &rp->freelist);
} else
call_rcu(&ri->rcu, free_rp_inst_rcu);
}
@@ -1290,11 +1287,14 @@ NOKPROBE_SYMBOL(kprobe_flush_task);
static inline void free_rp_inst(struct kretprobe *rp)
{
struct kretprobe_instance *ri;
- struct hlist_node *next;
+ struct freelist_node *node;
int count = 0;

- hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
- hlist_del(&ri->hlist);
+ node = rp->freelist.head;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, freelist);
+ node = node->next;
+
kfree(ri);
count++;
}
@@ -1925,32 +1925,26 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
{
struct kretprobe *rp = container_of(p, struct kretprobe, kp);
- unsigned long flags = 0;
struct kretprobe_instance *ri;
+ struct freelist_node *fn;

- /* TODO: consider to only swap the RA after the last pre_handler fired */
- raw_spin_lock_irqsave(&rp->lock, flags);
- if (!hlist_empty(&rp->free_instances)) {
- ri = hlist_entry(rp->free_instances.first,
- struct kretprobe_instance, hlist);
- hlist_del(&ri->hlist);
- raw_spin_unlock_irqrestore(&rp->lock, flags);
-
- if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- raw_spin_lock_irqsave(&rp->lock, flags);
- hlist_add_head(&ri->hlist, &rp->free_instances);
- raw_spin_unlock_irqrestore(&rp->lock, flags);
- return 0;
- }
-
- arch_prepare_kretprobe(ri, regs);
+ fn = freelist_try_get(&rp->freelist);
+ if (!fn) {
+ rp->nmissed++;
+ return 0;
+ }

- __llist_add(&ri->llist, &current->kretprobe_instances);
+ ri = container_of(fn, struct kretprobe_instance, freelist);

- } else {
- rp->nmissed++;
- raw_spin_unlock_irqrestore(&rp->lock, flags);
+ if (rp->entry_handler && rp->entry_handler(ri, regs)) {
+ freelist_add(&ri->freelist, &rp->freelist);
+ return 0;
}
+
+ arch_prepare_kretprobe(ri, regs);
+
+ __llist_add(&ri->llist, &current->kretprobe_instances);
+
return 0;
}
NOKPROBE_SYMBOL(pre_handler_kretprobe);
@@ -2007,8 +2001,7 @@ int register_kretprobe(struct kretprobe *rp)
rp->maxactive = num_possible_cpus();
#endif
}
- raw_spin_lock_init(&rp->lock);
- INIT_HLIST_HEAD(&rp->free_instances);
+ rp->freelist.head = NULL;
rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
if (!rp->rph)
return -ENOMEM;
@@ -2023,8 +2016,7 @@ int register_kretprobe(struct kretprobe *rp)
return -ENOMEM;
}
inst->rph = rp->rph;
- INIT_HLIST_NODE(&inst->hlist);
- hlist_add_head(&inst->hlist, &rp->free_instances);
+ freelist_add(&inst->freelist, &rp->freelist);
}
refcount_set(&rp->rph->ref, i);

Subject: [tip: perf/kprobes] freelist: Implement lockless freelist

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: e563604a5f5a891283b6a8db4001cee833a7c6b8
Gitweb: https://git.kernel.org/tip/e563604a5f5a891283b6a8db4001cee833a7c6b8
Author: Peter Zijlstra <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:03:46 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Mon, 12 Oct 2020 18:27:28 +02:00

freelist: Implement lockless freelist

A simple CAS-based lock-free free list. Not the fastest thing in the world
under heavy contention, but simple and correct (assuming nodes are never
freed until after the free list is destroyed), and fairly speedy under low
contention.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870622579.1229682.16729440870040944993.stgit@devnote2
---
include/linux/freelist.h | 129 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 129 insertions(+)
create mode 100644 include/linux/freelist.h

diff --git a/include/linux/freelist.h b/include/linux/freelist.h
new file mode 100644
index 0000000..fc1842b
--- /dev/null
+++ b/include/linux/freelist.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */
+#ifndef FREELIST_H
+#define FREELIST_H
+
+#include <linux/atomic.h>
+
+/*
+ * Copyright: [email protected]
+ *
+ * A simple CAS-based lock-free free list. Not the fastest thing in the world
+ * under heavy contention, but simple and correct (assuming nodes are never
+ * freed until after the free list is destroyed), and fairly speedy under low
+ * contention.
+ *
+ * Adapted from: https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists
+ */
+
+struct freelist_node {
+ atomic_t refs;
+ struct freelist_node *next;
+};
+
+struct freelist_head {
+ struct freelist_node *head;
+};
+
+#define REFS_ON_FREELIST 0x80000000
+#define REFS_MASK 0x7FFFFFFF
+
+static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+ /*
+ * Since the refcount is zero, and nobody can increase it once it's
+ * zero (except us, and we run only one copy of this method per node at
+ * a time, i.e. the single thread case), then we know we can safely
+ * change the next pointer of the node; however, once the refcount is
+ * back above zero, then other threads could increase it (happens under
+ * heavy contention, when the refcount goes to zero in between a load
+ * and a refcount increment of a node in try_get, then back up to
+ * something non-zero, then the refcount increment is done by the other
+ * thread) -- so if the CAS to add the node to the actual list fails,
+ * decrese the refcount and leave the add operation to the next thread
+ * who puts the refcount back to zero (which could be us, hence the
+ * loop).
+ */
+ struct freelist_node *head = READ_ONCE(list->head);
+
+ for (;;) {
+ WRITE_ONCE(node->next, head);
+ atomic_set_release(&node->refs, 1);
+
+ if (!try_cmpxchg_release(&list->head, &head, node)) {
+ /*
+ * Hmm, the add failed, but we can only try again when
+ * the refcount goes back to zero.
+ */
+ if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
+ continue;
+ }
+ return;
+ }
+}
+
+static inline void freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+ /*
+ * We know that the should-be-on-freelist bit is 0 at this point, so
+ * it's safe to set it using a fetch_add.
+ */
+ if (!atomic_fetch_add_release(REFS_ON_FREELIST, &node->refs)) {
+ /*
+ * Oh look! We were the last ones referencing this node, and we
+ * know we want to add it to the free list, so let's do it!
+ */
+ __freelist_add(node, list);
+ }
+}
+
+static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
+{
+ struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
+ unsigned int refs;
+
+ while (head) {
+ prev = head;
+ refs = atomic_read(&head->refs);
+ if ((refs & REFS_MASK) == 0 ||
+ !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
+ head = smp_load_acquire(&list->head);
+ continue;
+ }
+
+ /*
+ * Good, reference count has been incremented (it wasn't at
+ * zero), which means we can read the next and not worry about
+ * it changing between now and the time we do the CAS.
+ */
+ next = READ_ONCE(head->next);
+ if (try_cmpxchg_acquire(&list->head, &head, next)) {
+ /*
+ * Yay, got the node. This means it was on the list,
+ * which means should-be-on-freelist must be false no
+ * matter the refcount (because nobody else knows it's
+ * been taken off yet, it can't have been put back on).
+ */
+ WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
+
+ /*
+ * Decrease refcount twice, once for our ref, and once
+ * for the list's ref.
+ */
+ atomic_fetch_add(-2, &head->refs);
+
+ return head;
+ }
+
+ /*
+ * OK, the head must have changed on us, but we still need to decrement
+ * the refcount we increased.
+ */
+ refs = atomic_fetch_add(-1, &prev->refs);
+ if (refs == REFS_ON_FREELIST + 1)
+ __freelist_add(prev, list);
+ }
+
+ return NULL;
+}
+
+#endif /* FREELIST_H */

Subject: [tip: perf/kprobes] llist: Add nonatomic __llist_add() and __llist_dell_all()

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 476c5818c37a7828d558f34ae01f0c32f8bfadde
Gitweb: https://git.kernel.org/tip/476c5818c37a7828d558f34ae01f0c32f8bfadde
Author: Peter Zijlstra <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:03:13 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Mon, 12 Oct 2020 18:27:27 +02:00

llist: Add nonatomic __llist_add() and __llist_dell_all()

We'll use these in the new, lockless kretprobes code.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870619318.1229682.13027387548510028721.stgit@devnote2
---
drivers/gpu/drm/i915/i915_request.c | 6 ------
include/linux/llist.h | 23 +++++++++++++++++++++++
2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0b2fe55..0e851b9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -357,12 +357,6 @@ void i915_request_retire_upto(struct i915_request *rq)
} while (i915_request_retire(tmp) && tmp != rq);
}

-static void __llist_add(struct llist_node *node, struct llist_head *head)
-{
- node->next = head->first;
- head->first = node;
-}
-
static struct i915_request * const *
__engine_active(struct intel_engine_cs *engine)
{
diff --git a/include/linux/llist.h b/include/linux/llist.h
index 2e9c721..24f207b 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -197,6 +197,16 @@ static inline struct llist_node *llist_next(struct llist_node *node)
extern bool llist_add_batch(struct llist_node *new_first,
struct llist_node *new_last,
struct llist_head *head);
+
+static inline bool __llist_add_batch(struct llist_node *new_first,
+ struct llist_node *new_last,
+ struct llist_head *head)
+{
+ new_last->next = head->first;
+ head->first = new_first;
+ return new_last->next == NULL;
+}
+
/**
* llist_add - add a new entry
* @new: new entry to be added
@@ -209,6 +219,11 @@ static inline bool llist_add(struct llist_node *new, struct llist_head *head)
return llist_add_batch(new, new, head);
}

+static inline bool __llist_add(struct llist_node *new, struct llist_head *head)
+{
+ return __llist_add_batch(new, new, head);
+}
+
/**
* llist_del_all - delete all entries from lock-less list
* @head: the head of lock-less list to delete all entries
@@ -222,6 +237,14 @@ static inline struct llist_node *llist_del_all(struct llist_head *head)
return xchg(&head->first, NULL);
}

+static inline struct llist_node *__llist_del_all(struct llist_head *head)
+{
+ struct llist_node *first = head->first;
+
+ head->first = NULL;
+ return first;
+}
+
extern struct llist_node *llist_del_first(struct llist_head *head);

struct llist_node *llist_reverse_order(struct llist_node *head);

Subject: [tip: perf/kprobes] asm-generic/atomic: Add try_cmpxchg() fallbacks

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: 29f006fdefe6f88abde973a0b0f20d2704e93fd4
Gitweb: https://git.kernel.org/tip/29f006fdefe6f88abde973a0b0f20d2704e93fd4
Author: Peter Zijlstra <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:03:35 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Mon, 12 Oct 2020 18:27:27 +02:00

asm-generic/atomic: Add try_cmpxchg() fallbacks

Only x86 provides try_cmpxchg() outside of the atomic_t interfaces,
provide generic fallbacks to create this interface from the widely
available cmpxchg() function.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Will Deacon <[email protected]>
Link: https://lore.kernel.org/r/159870621515.1229682.15506193091065001742.stgit@devnote2
---
arch/x86/include/asm/atomic.h | 2 +-
arch/x86/include/asm/atomic64_64.h | 2 +-
arch/x86/include/asm/cmpxchg.h | 2 +-
include/asm-generic/atomic-instrumented.h | 216 ++++++++++++---------
include/linux/atomic-arch-fallback.h | 90 ++++++++-
include/linux/atomic-fallback.h | 90 ++++++++-
scripts/atomic/gen-atomic-fallback.sh | 63 +++++-
scripts/atomic/gen-atomic-instrumented.sh | 29 ++-
8 files changed, 372 insertions(+), 122 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index b6cac6e..f732741 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -199,7 +199,7 @@ static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)

static __always_inline bool arch_atomic_try_cmpxchg(atomic_t *v, int *old, int new)
{
- return try_cmpxchg(&v->counter, old, new);
+ return arch_try_cmpxchg(&v->counter, old, new);
}
#define arch_atomic_try_cmpxchg arch_atomic_try_cmpxchg

diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 809bd01..7886d05 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -187,7 +187,7 @@ static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)

static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
{
- return try_cmpxchg(&v->counter, old, new);
+ return arch_try_cmpxchg(&v->counter, old, new);
}
#define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg

diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index a8bfac1..4d4ec5c 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -221,7 +221,7 @@ extern void __add_wrong_size(void)
#define __try_cmpxchg(ptr, pold, new, size) \
__raw_try_cmpxchg((ptr), (pold), (new), (size), LOCK_PREFIX)

-#define try_cmpxchg(ptr, pold, new) \
+#define arch_try_cmpxchg(ptr, pold, new) \
__try_cmpxchg((ptr), (pold), (new), sizeof(*(ptr)))

/*
diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index 379986e..2cab3fd 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -1642,148 +1642,192 @@ atomic64_dec_if_positive(atomic64_t *v)
#endif

#if !defined(arch_xchg_relaxed) || defined(arch_xchg)
-#define xchg(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg(__ai_ptr, __VA_ARGS__); \
+#define xchg(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_xchg_acquire)
-#define xchg_acquire(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
+#define xchg_acquire(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg_acquire(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_xchg_release)
-#define xchg_release(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg_release(__ai_ptr, __VA_ARGS__); \
+#define xchg_release(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg_release(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_xchg_relaxed)
-#define xchg_relaxed(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
+#define xchg_relaxed(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_xchg_relaxed(__ai_ptr, __VA_ARGS__); \
})
#endif

#if !defined(arch_cmpxchg_relaxed) || defined(arch_cmpxchg)
-#define cmpxchg(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg_acquire)
-#define cmpxchg_acquire(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_acquire(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_acquire(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg_release)
-#define cmpxchg_release(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_release(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_release(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg_relaxed)
-#define cmpxchg_relaxed(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_relaxed(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_relaxed(__ai_ptr, __VA_ARGS__); \
})
#endif

#if !defined(arch_cmpxchg64_relaxed) || defined(arch_cmpxchg64)
-#define cmpxchg64(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg64_acquire)
-#define cmpxchg64_acquire(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_acquire(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_acquire(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg64_release)
-#define cmpxchg64_release(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_release(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_release(__ai_ptr, __VA_ARGS__); \
})
#endif

#if defined(arch_cmpxchg64_relaxed)
-#define cmpxchg64_relaxed(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_relaxed(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_relaxed(__ai_ptr, __VA_ARGS__); \
})
#endif

-#define cmpxchg_local(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
+#if !defined(arch_try_cmpxchg_relaxed) || defined(arch_try_cmpxchg)
+#define try_cmpxchg(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_acquire)
+#define try_cmpxchg_acquire(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_release)
+#define try_cmpxchg_release(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#if defined(arch_try_cmpxchg_relaxed)
+#define try_cmpxchg_relaxed(ptr, oldp, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ typeof(oldp) __ai_oldp = (oldp); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \
+ arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \
+})
+#endif
+
+#define cmpxchg_local(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg_local(__ai_ptr, __VA_ARGS__); \
})

-#define cmpxchg64_local(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg64_local(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_cmpxchg64_local(__ai_ptr, __VA_ARGS__); \
})

-#define sync_cmpxchg(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
- arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
+#define sync_cmpxchg(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \
+ arch_sync_cmpxchg(__ai_ptr, __VA_ARGS__); \
})

-#define cmpxchg_double(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
- arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_double(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+ arch_cmpxchg_double(__ai_ptr, __VA_ARGS__); \
})


-#define cmpxchg_double_local(ptr, ...) \
-({ \
- typeof(ptr) __ai_ptr = (ptr); \
- instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
- arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
+#define cmpxchg_double_local(ptr, ...) \
+({ \
+ typeof(ptr) __ai_ptr = (ptr); \
+ instrument_atomic_write(__ai_ptr, 2 * sizeof(*__ai_ptr)); \
+ arch_cmpxchg_double_local(__ai_ptr, __VA_ARGS__); \
})

#endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// 89bf97f3a7509b740845e51ddf31055b48a81f40
+// ff0fe7f81ee97f01f13bb78b0e3ce800bc56d9dd
diff --git a/include/linux/atomic-arch-fallback.h b/include/linux/atomic-arch-fallback.h
index bcb6aa2..a3dba31 100644
--- a/include/linux/atomic-arch-fallback.h
+++ b/include/linux/atomic-arch-fallback.h
@@ -9,9 +9,9 @@
#include <linux/compiler.h>

#ifndef arch_xchg_relaxed
-#define arch_xchg_relaxed arch_xchg
-#define arch_xchg_acquire arch_xchg
-#define arch_xchg_release arch_xchg
+#define arch_xchg_acquire arch_xchg
+#define arch_xchg_release arch_xchg
+#define arch_xchg_relaxed arch_xchg
#else /* arch_xchg_relaxed */

#ifndef arch_xchg_acquire
@@ -32,9 +32,9 @@
#endif /* arch_xchg_relaxed */

#ifndef arch_cmpxchg_relaxed
-#define arch_cmpxchg_relaxed arch_cmpxchg
-#define arch_cmpxchg_acquire arch_cmpxchg
-#define arch_cmpxchg_release arch_cmpxchg
+#define arch_cmpxchg_acquire arch_cmpxchg
+#define arch_cmpxchg_release arch_cmpxchg
+#define arch_cmpxchg_relaxed arch_cmpxchg
#else /* arch_cmpxchg_relaxed */

#ifndef arch_cmpxchg_acquire
@@ -55,9 +55,9 @@
#endif /* arch_cmpxchg_relaxed */

#ifndef arch_cmpxchg64_relaxed
-#define arch_cmpxchg64_relaxed arch_cmpxchg64
-#define arch_cmpxchg64_acquire arch_cmpxchg64
-#define arch_cmpxchg64_release arch_cmpxchg64
+#define arch_cmpxchg64_acquire arch_cmpxchg64
+#define arch_cmpxchg64_release arch_cmpxchg64
+#define arch_cmpxchg64_relaxed arch_cmpxchg64
#else /* arch_cmpxchg64_relaxed */

#ifndef arch_cmpxchg64_acquire
@@ -77,6 +77,76 @@

#endif /* arch_cmpxchg64_relaxed */

+#ifndef arch_try_cmpxchg_relaxed
+#ifdef arch_try_cmpxchg
+#define arch_try_cmpxchg_acquire arch_try_cmpxchg
+#define arch_try_cmpxchg_release arch_try_cmpxchg
+#define arch_try_cmpxchg_relaxed arch_try_cmpxchg
+#endif /* arch_try_cmpxchg */
+
+#ifndef arch_try_cmpxchg
+#define arch_try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg */
+
+#ifndef arch_try_cmpxchg_acquire
+#define arch_try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_acquire((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_acquire */
+
+#ifndef arch_try_cmpxchg_release
+#define arch_try_cmpxchg_release(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_release((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_release */
+
+#ifndef arch_try_cmpxchg_relaxed
+#define arch_try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = arch_cmpxchg_relaxed((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* arch_try_cmpxchg_relaxed */
+
+#else /* arch_try_cmpxchg_relaxed */
+
+#ifndef arch_try_cmpxchg_acquire
+#define arch_try_cmpxchg_acquire(...) \
+ __atomic_op_acquire(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_try_cmpxchg_release
+#define arch_try_cmpxchg_release(...) \
+ __atomic_op_release(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef arch_try_cmpxchg
+#define arch_try_cmpxchg(...) \
+ __atomic_op_fence(arch_try_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* arch_try_cmpxchg_relaxed */
+
#ifndef arch_atomic_read_acquire
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
@@ -2288,4 +2358,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 90cd26cfd69d2250303d654955a0cc12620fb91b
+// cca554917d7ea73d5e3e7397dd70c484cad9b2c4
diff --git a/include/linux/atomic-fallback.h b/include/linux/atomic-fallback.h
index fd525c7..2a3f55d 100644
--- a/include/linux/atomic-fallback.h
+++ b/include/linux/atomic-fallback.h
@@ -9,9 +9,9 @@
#include <linux/compiler.h>

#ifndef xchg_relaxed
-#define xchg_relaxed xchg
-#define xchg_acquire xchg
-#define xchg_release xchg
+#define xchg_acquire xchg
+#define xchg_release xchg
+#define xchg_relaxed xchg
#else /* xchg_relaxed */

#ifndef xchg_acquire
@@ -32,9 +32,9 @@
#endif /* xchg_relaxed */

#ifndef cmpxchg_relaxed
-#define cmpxchg_relaxed cmpxchg
-#define cmpxchg_acquire cmpxchg
-#define cmpxchg_release cmpxchg
+#define cmpxchg_acquire cmpxchg
+#define cmpxchg_release cmpxchg
+#define cmpxchg_relaxed cmpxchg
#else /* cmpxchg_relaxed */

#ifndef cmpxchg_acquire
@@ -55,9 +55,9 @@
#endif /* cmpxchg_relaxed */

#ifndef cmpxchg64_relaxed
-#define cmpxchg64_relaxed cmpxchg64
-#define cmpxchg64_acquire cmpxchg64
-#define cmpxchg64_release cmpxchg64
+#define cmpxchg64_acquire cmpxchg64
+#define cmpxchg64_release cmpxchg64
+#define cmpxchg64_relaxed cmpxchg64
#else /* cmpxchg64_relaxed */

#ifndef cmpxchg64_acquire
@@ -77,6 +77,76 @@

#endif /* cmpxchg64_relaxed */

+#ifndef try_cmpxchg_relaxed
+#ifdef try_cmpxchg
+#define try_cmpxchg_acquire try_cmpxchg
+#define try_cmpxchg_release try_cmpxchg
+#define try_cmpxchg_relaxed try_cmpxchg
+#endif /* try_cmpxchg */
+
+#ifndef try_cmpxchg
+#define try_cmpxchg(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg */
+
+#ifndef try_cmpxchg_acquire
+#define try_cmpxchg_acquire(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg_acquire((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_acquire */
+
+#ifndef try_cmpxchg_release
+#define try_cmpxchg_release(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg_release((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_release */
+
+#ifndef try_cmpxchg_relaxed
+#define try_cmpxchg_relaxed(_ptr, _oldp, _new) \
+({ \
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \
+ ___r = cmpxchg_relaxed((_ptr), ___o, (_new)); \
+ if (unlikely(___r != ___o)) \
+ *___op = ___r; \
+ likely(___r == ___o); \
+})
+#endif /* try_cmpxchg_relaxed */
+
+#else /* try_cmpxchg_relaxed */
+
+#ifndef try_cmpxchg_acquire
+#define try_cmpxchg_acquire(...) \
+ __atomic_op_acquire(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef try_cmpxchg_release
+#define try_cmpxchg_release(...) \
+ __atomic_op_release(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#ifndef try_cmpxchg
+#define try_cmpxchg(...) \
+ __atomic_op_fence(try_cmpxchg, __VA_ARGS__)
+#endif
+
+#endif /* try_cmpxchg_relaxed */
+
#define arch_atomic_read atomic_read
#define arch_atomic_read_acquire atomic_read_acquire

@@ -2522,4 +2592,4 @@ atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// 9d95b56f98d82a2a26c7b79ccdd0c47572d50a6f
+// d78e6c293c661c15188f0ec05bce45188c8d5892
diff --git a/scripts/atomic/gen-atomic-fallback.sh b/scripts/atomic/gen-atomic-fallback.sh
index 693dfa1..317a6ce 100755
--- a/scripts/atomic/gen-atomic-fallback.sh
+++ b/scripts/atomic/gen-atomic-fallback.sh
@@ -144,15 +144,11 @@ gen_proto_order_variants()
printf "#endif /* ${basename}_relaxed */\n\n"
}

-gen_xchg_fallbacks()
+gen_order_fallbacks()
{
local xchg="$1"; shift
+
cat <<EOF
-#ifndef ${xchg}_relaxed
-#define ${xchg}_relaxed ${xchg}
-#define ${xchg}_acquire ${xchg}
-#define ${xchg}_release ${xchg}
-#else /* ${xchg}_relaxed */

#ifndef ${xchg}_acquire
#define ${xchg}_acquire(...) \\
@@ -169,11 +165,62 @@ cat <<EOF
__atomic_op_fence(${xchg}, __VA_ARGS__)
#endif

-#endif /* ${xchg}_relaxed */
+EOF
+}
+
+gen_xchg_fallbacks()
+{
+ local xchg="$1"; shift
+ printf "#ifndef ${xchg}_relaxed\n"
+
+ gen_basic_fallbacks ${xchg}
+
+ printf "#else /* ${xchg}_relaxed */\n"
+
+ gen_order_fallbacks ${xchg}
+
+ printf "#endif /* ${xchg}_relaxed */\n\n"
+}
+
+gen_try_cmpxchg_fallback()
+{
+ local order="$1"; shift;
+
+cat <<EOF
+#ifndef ${ARCH}try_cmpxchg${order}
+#define ${ARCH}try_cmpxchg${order}(_ptr, _oldp, _new) \\
+({ \\
+ typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \\
+ ___r = ${ARCH}cmpxchg${order}((_ptr), ___o, (_new)); \\
+ if (unlikely(___r != ___o)) \\
+ *___op = ___r; \\
+ likely(___r == ___o); \\
+})
+#endif /* ${ARCH}try_cmpxchg${order} */

EOF
}

+gen_try_cmpxchg_fallbacks()
+{
+ printf "#ifndef ${ARCH}try_cmpxchg_relaxed\n"
+ printf "#ifdef ${ARCH}try_cmpxchg\n"
+
+ gen_basic_fallbacks "${ARCH}try_cmpxchg"
+
+ printf "#endif /* ${ARCH}try_cmpxchg */\n\n"
+
+ for order in "" "_acquire" "_release" "_relaxed"; do
+ gen_try_cmpxchg_fallback "${order}"
+ done
+
+ printf "#else /* ${ARCH}try_cmpxchg_relaxed */\n"
+
+ gen_order_fallbacks "${ARCH}try_cmpxchg"
+
+ printf "#endif /* ${ARCH}try_cmpxchg_relaxed */\n\n"
+}
+
cat << EOF
// SPDX-License-Identifier: GPL-2.0

@@ -191,6 +238,8 @@ for xchg in "${ARCH}xchg" "${ARCH}cmpxchg" "${ARCH}cmpxchg64"; do
gen_xchg_fallbacks "${xchg}"
done

+gen_try_cmpxchg_fallbacks
+
grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "${ARCH}" "atomic" "int" ${args}
done
diff --git a/scripts/atomic/gen-atomic-instrumented.sh b/scripts/atomic/gen-atomic-instrumented.sh
index 6afadf7..85dc256 100755
--- a/scripts/atomic/gen-atomic-instrumented.sh
+++ b/scripts/atomic/gen-atomic-instrumented.sh
@@ -103,14 +103,31 @@ gen_xchg()
local xchg="$1"; shift
local mult="$1"; shift

+ if [ "${xchg%${xchg#try_cmpxchg}}" = "try_cmpxchg" ] ; then
+
+cat <<EOF
+#define ${xchg}(ptr, oldp, ...) \\
+({ \\
+ typeof(ptr) __ai_ptr = (ptr); \\
+ typeof(oldp) __ai_oldp = (oldp); \\
+ instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
+ instrument_atomic_write(__ai_oldp, ${mult}sizeof(*__ai_oldp)); \\
+ arch_${xchg}(__ai_ptr, __ai_oldp, __VA_ARGS__); \\
+})
+EOF
+
+ else
+
cat <<EOF
-#define ${xchg}(ptr, ...) \\
-({ \\
- typeof(ptr) __ai_ptr = (ptr); \\
- instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
- arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
+#define ${xchg}(ptr, ...) \\
+({ \\
+ typeof(ptr) __ai_ptr = (ptr); \\
+ instrument_atomic_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
+ arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
})
EOF
+
+ fi
}

gen_optional_xchg()
@@ -160,7 +177,7 @@ grep '^[a-z]' "$1" | while read name meta args; do
gen_proto "${meta}" "${name}" "atomic64" "s64" ${args}
done

-for xchg in "xchg" "cmpxchg" "cmpxchg64"; do
+for xchg in "xchg" "cmpxchg" "cmpxchg64" "try_cmpxchg"; do
for order in "" "_acquire" "_release" "_relaxed"; do
gen_optional_xchg "${xchg}" "${order}"
done

Subject: [tip: perf/kprobes] kprobes: Remove kretprobe hash

The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: d741bf41d7c7db4898bacfcb020353cddc032fd8
Gitweb: https://git.kernel.org/tip/d741bf41d7c7db4898bacfcb020353cddc032fd8
Author: Peter Zijlstra <[email protected]>
AuthorDate: Sat, 29 Aug 2020 22:03:24 +09:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Mon, 12 Oct 2020 18:27:27 +02:00

kprobes: Remove kretprobe hash

The kretprobe hash is mostly superfluous, replace it with a per-task
variable.

This gets rid of the task hash and it's related locking.

Note that this may change the kprobes module-exported API for kretprobe
handlers. If any out-of-tree kretprobe user uses ri->rp, use
get_kretprobe(ri) instead.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/159870620431.1229682.16325792502413731312.stgit@devnote2
---
include/linux/kprobes.h | 19 ++-
include/linux/sched.h | 4 +-
kernel/fork.c | 4 +-
kernel/kprobes.c | 236 ++++++++++-------------------------
kernel/trace/trace_kprobe.c | 3 +-
5 files changed, 97 insertions(+), 169 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 5c8c271..00cf442 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -27,6 +27,7 @@
#include <linux/rcupdate.h>
#include <linux/mutex.h>
#include <linux/ftrace.h>
+#include <linux/refcount.h>
#include <asm/kprobes.h>

#ifdef CONFIG_KPROBES
@@ -144,6 +145,11 @@ static inline int kprobe_ftrace(struct kprobe *p)
* ignored, due to maxactive being too low.
*
*/
+struct kretprobe_holder {
+ struct kretprobe *rp;
+ refcount_t ref;
+};
+
struct kretprobe {
struct kprobe kp;
kretprobe_handler_t handler;
@@ -152,17 +158,18 @@ struct kretprobe {
int nmissed;
size_t data_size;
struct hlist_head free_instances;
+ struct kretprobe_holder *rph;
raw_spinlock_t lock;
};

struct kretprobe_instance {
union {
+ struct llist_node llist;
struct hlist_node hlist;
struct rcu_head rcu;
};
- struct kretprobe *rp;
+ struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
- struct task_struct *task;
void *fp;
char data[];
};
@@ -221,6 +228,14 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
return ret;
}

+static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri)
+{
+ RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
+ "Kretprobe is accessed from instance under preemptive context");
+
+ return READ_ONCE(ri->rph->rp);
+}
+
#else /* CONFIG_KRETPROBES */
static inline void arch_prepare_kretprobe(struct kretprobe *rp,
struct pt_regs *regs)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index afe01e2..5911805 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1315,6 +1315,10 @@ struct task_struct {
struct callback_head mce_kill_me;
#endif

+#ifdef CONFIG_KRETPROBES
+ struct llist_head kretprobe_instances;
+#endif
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/kernel/fork.c b/kernel/fork.c
index 49677d6..53a1f50 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2161,6 +2161,10 @@ static __latent_entropy struct task_struct *copy_process(
INIT_LIST_HEAD(&p->thread_group);
p->task_works = NULL;

+#ifdef CONFIG_KRETPROBES
+ p->kretprobe_instances.first = NULL;
+#endif
+
/*
* Ensure that the cgroup subsystem policies allow the new process to be
* forked. It should be noted the the new process's css_set can be changed
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 3b61ae8..850ee36 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -53,7 +53,6 @@ static int kprobes_initialized;
* - RCU hlist traversal under disabling preempt (breakpoint handlers)
*/
static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];

/* NOTE: change this value only with kprobe_mutex held */
static bool kprobes_all_disarmed;
@@ -61,9 +60,6 @@ static bool kprobes_all_disarmed;
/* This protects kprobe_table and optimizing_list */
static DEFINE_MUTEX(kprobe_mutex);
static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL;
-static struct {
- raw_spinlock_t lock ____cacheline_aligned_in_smp;
-} kretprobe_table_locks[KPROBE_TABLE_SIZE];

kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
unsigned int __unused)
@@ -71,11 +67,6 @@ kprobe_opcode_t * __weak kprobe_lookup_name(const char *name,
return ((kprobe_opcode_t *)(kallsyms_lookup_name(name)));
}

-static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
-{
- return &(kretprobe_table_locks[hash].lock);
-}
-
/* Blacklist -- list of struct kprobe_blacklist_entry */
static LIST_HEAD(kprobe_blacklist);

@@ -1223,65 +1214,30 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
}
NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);

+static void free_rp_inst_rcu(struct rcu_head *head)
+{
+ struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
+
+ if (refcount_dec_and_test(&ri->rph->ref))
+ kfree(ri->rph);
+ kfree(ri);
+}
+NOKPROBE_SYMBOL(free_rp_inst_rcu);
+
static void recycle_rp_inst(struct kretprobe_instance *ri)
{
- struct kretprobe *rp = ri->rp;
+ struct kretprobe *rp = get_kretprobe(ri);

- /* remove rp inst off the rprobe_inst_table */
- hlist_del(&ri->hlist);
INIT_HLIST_NODE(&ri->hlist);
if (likely(rp)) {
raw_spin_lock(&rp->lock);
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock(&rp->lock);
} else
- kfree_rcu(ri, rcu);
+ call_rcu(&ri->rcu, free_rp_inst_rcu);
}
NOKPROBE_SYMBOL(recycle_rp_inst);

-static void kretprobe_hash_lock(struct task_struct *tsk,
- struct hlist_head **head, unsigned long *flags)
-__acquires(hlist_lock)
-{
- unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
- raw_spinlock_t *hlist_lock;
-
- *head = &kretprobe_inst_table[hash];
- hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_lock);
-
-static void kretprobe_table_lock(unsigned long hash,
- unsigned long *flags)
-__acquires(hlist_lock)
-{
- raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_lock);
-
-static void kretprobe_hash_unlock(struct task_struct *tsk,
- unsigned long *flags)
-__releases(hlist_lock)
-{
- unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
- raw_spinlock_t *hlist_lock;
-
- hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_hash_unlock);
-
-static void kretprobe_table_unlock(unsigned long hash,
- unsigned long *flags)
-__releases(hlist_lock)
-{
- raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_unlock_irqrestore(hlist_lock, *flags);
-}
-NOKPROBE_SYMBOL(kretprobe_table_unlock);
-
static struct kprobe kprobe_busy = {
.addr = (void *) get_kprobe,
};
@@ -1311,24 +1267,21 @@ void kprobe_busy_end(void)
void kprobe_flush_task(struct task_struct *tk)
{
struct kretprobe_instance *ri;
- struct hlist_head *head;
- struct hlist_node *tmp;
- unsigned long hash, flags = 0;
+ struct llist_node *node;

+ /* Early boot, not yet initialized. */
if (unlikely(!kprobes_initialized))
- /* Early boot. kretprobe_table_locks not yet initialized. */
return;

kprobe_busy_begin();

- hash = hash_ptr(tk, KPROBE_HASH_BITS);
- head = &kretprobe_inst_table[hash];
- kretprobe_table_lock(hash, &flags);
- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task == tk)
- recycle_rp_inst(ri);
+ node = __llist_del_all(&tk->kretprobe_instances);
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);
+ node = node->next;
+
+ recycle_rp_inst(ri);
}
- kretprobe_table_unlock(hash, &flags);

kprobe_busy_end();
}
@@ -1338,36 +1291,19 @@ static inline void free_rp_inst(struct kretprobe *rp)
{
struct kretprobe_instance *ri;
struct hlist_node *next;
+ int count = 0;

hlist_for_each_entry_safe(ri, next, &rp->free_instances, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
+ count++;
}
-}
-
-static void cleanup_rp_inst(struct kretprobe *rp)
-{
- unsigned long flags, hash;
- struct kretprobe_instance *ri;
- struct hlist_node *next;
- struct hlist_head *head;

- /* To avoid recursive kretprobe by NMI, set kprobe busy here */
- kprobe_busy_begin();
- for (hash = 0; hash < KPROBE_TABLE_SIZE; hash++) {
- kretprobe_table_lock(hash, &flags);
- head = &kretprobe_inst_table[hash];
- hlist_for_each_entry_safe(ri, next, head, hlist) {
- if (ri->rp == rp)
- ri->rp = NULL;
- }
- kretprobe_table_unlock(hash, &flags);
+ if (refcount_sub_and_test(count, &rp->rph->ref)) {
+ kfree(rp->rph);
+ rp->rph = NULL;
}
- kprobe_busy_end();
-
- free_rp_inst(rp);
}
-NOKPROBE_SYMBOL(cleanup_rp_inst);

/* Add the new probe to ap->list */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
@@ -1928,88 +1864,56 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *trampoline_address,
void *frame_pointer)
{
- struct kretprobe_instance *ri = NULL, *last = NULL;
- struct hlist_head *head;
- struct hlist_node *tmp;
- unsigned long flags;
kprobe_opcode_t *correct_ret_addr = NULL;
- bool skipped = false;
+ struct kretprobe_instance *ri = NULL;
+ struct llist_node *first, *node;
+ struct kretprobe *rp;

- kretprobe_hash_lock(current, &head, &flags);
+ /* Find all nodes for this frame. */
+ first = node = current->kretprobe_instances.first;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);

- /*
- * It is possible to have multiple instances associated with a given
- * task either because multiple functions in the call path have
- * return probes installed on them, and/or more than one
- * return probe was registered for a target function.
- *
- * We can handle this because:
- * - instances are always pushed into the head of the list
- * - when multiple return probes are registered for the same
- * function, the (chronologically) first instance's ret_addr
- * will be the real return address, and all the rest will
- * point to kretprobe_trampoline.
- */
- hlist_for_each_entry(ri, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- /*
- * Return probes must be pushed on this hash list correct
- * order (same as return order) so that it can be popped
- * correctly. However, if we find it is pushed it incorrect
- * order, this means we find a function which should not be
- * probed, because the wrong order entry is pushed on the
- * path of processing other kretprobe itself.
- */
- if (ri->fp != frame_pointer) {
- if (!skipped)
- pr_warn("kretprobe is stacked incorrectly. Trying to fixup.\n");
- skipped = true;
- continue;
- }
+ BUG_ON(ri->fp != frame_pointer);

- correct_ret_addr = ri->ret_addr;
- if (skipped)
- pr_warn("%ps must be blacklisted because of incorrect kretprobe order\n",
- ri->rp->kp.addr);
-
- if (correct_ret_addr != trampoline_address)
+ if (ri->ret_addr != trampoline_address) {
+ correct_ret_addr = ri->ret_addr;
/*
* This is the real return address. Any other
* instances associated with this task are for
* other calls deeper on the call stack
*/
- break;
+ goto found;
+ }
+
+ node = node->next;
}
+ pr_err("Oops! Kretprobe fails to find correct return address.\n");
+ BUG_ON(1);

- BUG_ON(!correct_ret_addr || (correct_ret_addr == trampoline_address));
- last = ri;
+found:
+ /* Unlink all nodes for this frame. */
+ current->kretprobe_instances.first = node->next;
+ node->next = NULL;

- hlist_for_each_entry_safe(ri, tmp, head, hlist) {
- if (ri->task != current)
- /* another task is sharing our hash bucket */
- continue;
- if (ri->fp != frame_pointer)
- continue;
+ /* Run them.. */
+ while (first) {
+ ri = container_of(first, struct kretprobe_instance, llist);
+ first = first->next;

- if (ri->rp && ri->rp->handler) {
+ rp = get_kretprobe(ri);
+ if (rp && rp->handler) {
struct kprobe *prev = kprobe_running();

- __this_cpu_write(current_kprobe, &ri->rp->kp);
+ __this_cpu_write(current_kprobe, &rp->kp);
ri->ret_addr = correct_ret_addr;
- ri->rp->handler(ri, regs);
+ rp->handler(ri, regs);
__this_cpu_write(current_kprobe, prev);
}

recycle_rp_inst(ri);
-
- if (ri == last)
- break;
}

- kretprobe_hash_unlock(current, &flags);
-
return (unsigned long)correct_ret_addr;
}
NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
@@ -2021,11 +1925,10 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
{
struct kretprobe *rp = container_of(p, struct kretprobe, kp);
- unsigned long hash, flags = 0;
+ unsigned long flags = 0;
struct kretprobe_instance *ri;

/* TODO: consider to only swap the RA after the last pre_handler fired */
- hash = hash_ptr(current, KPROBE_HASH_BITS);
raw_spin_lock_irqsave(&rp->lock, flags);
if (!hlist_empty(&rp->free_instances)) {
ri = hlist_entry(rp->free_instances.first,
@@ -2033,9 +1936,6 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
hlist_del(&ri->hlist);
raw_spin_unlock_irqrestore(&rp->lock, flags);

- ri->rp = rp;
- ri->task = current;
-
if (rp->entry_handler && rp->entry_handler(ri, regs)) {
raw_spin_lock_irqsave(&rp->lock, flags);
hlist_add_head(&ri->hlist, &rp->free_instances);
@@ -2045,11 +1945,8 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)

arch_prepare_kretprobe(ri, regs);

- /* XXX(hch): why is there no hlist_move_head? */
- INIT_HLIST_NODE(&ri->hlist);
- kretprobe_table_lock(hash, &flags);
- hlist_add_head(&ri->hlist, &kretprobe_inst_table[hash]);
- kretprobe_table_unlock(hash, &flags);
+ __llist_add(&ri->llist, &current->kretprobe_instances);
+
} else {
rp->nmissed++;
raw_spin_unlock_irqrestore(&rp->lock, flags);
@@ -2112,16 +2009,24 @@ int register_kretprobe(struct kretprobe *rp)
}
raw_spin_lock_init(&rp->lock);
INIT_HLIST_HEAD(&rp->free_instances);
+ rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
+ if (!rp->rph)
+ return -ENOMEM;
+
+ rp->rph->rp = rp;
for (i = 0; i < rp->maxactive; i++) {
- inst = kmalloc(sizeof(struct kretprobe_instance) +
+ inst = kzalloc(sizeof(struct kretprobe_instance) +
rp->data_size, GFP_KERNEL);
if (inst == NULL) {
+ refcount_set(&rp->rph->ref, i);
free_rp_inst(rp);
return -ENOMEM;
}
+ inst->rph = rp->rph;
INIT_HLIST_NODE(&inst->hlist);
hlist_add_head(&inst->hlist, &rp->free_instances);
}
+ refcount_set(&rp->rph->ref, i);

rp->nmissed = 0;
/* Establish function entry probe point */
@@ -2163,16 +2068,18 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
if (num <= 0)
return;
mutex_lock(&kprobe_mutex);
- for (i = 0; i < num; i++)
+ for (i = 0; i < num; i++) {
if (__unregister_kprobe_top(&rps[i]->kp) < 0)
rps[i]->kp.addr = NULL;
+ rps[i]->rph->rp = NULL;
+ }
mutex_unlock(&kprobe_mutex);

synchronize_rcu();
for (i = 0; i < num; i++) {
if (rps[i]->kp.addr) {
__unregister_kprobe_bottom(&rps[i]->kp);
- cleanup_rp_inst(rps[i]);
+ free_rp_inst(rps[i]);
}
}
}
@@ -2535,11 +2442,8 @@ static int __init init_kprobes(void)

/* FIXME allocate the probe table, currently defined statically */
/* initialize all list heads */
- for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
+ for (i = 0; i < KPROBE_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&kprobe_table[i]);
- INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
- raw_spin_lock_init(&(kretprobe_table_locks[i].lock));
- }

err = populate_kprobe_blacklist(__start_kprobe_blacklist,
__stop_kprobe_blacklist);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index aefb606..07baf6f 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1714,7 +1714,8 @@ NOKPROBE_SYMBOL(kprobe_dispatcher);
static int
kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
{
- struct trace_kprobe *tk = container_of(ri->rp, struct trace_kprobe, rp);
+ struct kretprobe *rp = get_kretprobe(ri);
+ struct trace_kprobe *tk = container_of(rp, struct trace_kprobe, rp);

raw_cpu_inc(*tk->nhit);

2020-10-14 08:04:25

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()

On Mon, 12 Oct 2020 18:24:54 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > From: Peter Zijlstra <[email protected]>
> >
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>
> Because you are forwarding this patch here, I've added your SOB:
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
>
> (Let me know if that's not OK.)

OK, I keep it in mind next time.

Thanks Ingo!

>
> Thanks,
>
> Ingo


--
Masami Hiramatsu <[email protected]>

2020-10-31 01:43:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Sat, 29 Aug 2020 22:02:36 +0900
Masami Hiramatsu <[email protected]> wrote:

> Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> kretprobe from within kprobe_flush_task") sets a dummy current
> kprobe in the trampoline handler by kprobe_busy_begin/end(),
> it is not possible to run a kretprobe pre handler in kretprobe
> trampoline handler context even with the NMI. If the NMI interrupts
> a kretprobe_trampoline_handler() and it hits a kretprobe, the
> 2nd kretprobe will detect recursion correctly and it will be
> skipped.
> This means we have almost no double-lock issue on kretprobes by NMI.
>
> The last one point is in cleanup_rp_inst() which also takes
> kretprobe_table_lock without setting up current kprobes.
> So adding kprobe_busy_begin/end() there allows us to remove
> in_nmi() check.
>
> The above commit applies kprobe_busy_begin/end() on x86, but
> now all arch implementation are unified to generic one, we can
> safely remove the in_nmi() check from arch independent code.
>

So are you saying that lockdep is lying?

Kprobe smoke test: started

================================
WARNING: inconsistent lock state
5.10.0-rc1-test+ #29 Not tainted
--------------------------------
inconsistent {INITIAL USE} -> {IN-NMI} usage.
swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
{INITIAL USE} state was registered at:
lock_acquire+0x280/0x325
_raw_spin_lock+0x30/0x3f
recycle_rp_inst+0x3f/0x86
__kretprobe_trampoline_handler+0x13a/0x177
trampoline_handler+0x48/0x57
kretprobe_trampoline+0x2a/0x4f
kretprobe_trampoline+0x0/0x4f
init_kprobes+0x193/0x19d
do_one_initcall+0xf9/0x27e
kernel_init_freeable+0x16e/0x2b6
kernel_init+0xe/0x109
ret_from_fork+0x22/0x30
irq event stamp: 1670
hardirqs last enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
softirqs last enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&rp->lock);
<Interrupt>
lock(&rp->lock);

*** DEADLOCK ***

no locks held by swapper/0/1.

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:
dump_stack+0x7d/0x9f
print_usage_bug+0x1c0/0x1d3
lock_acquire+0x302/0x325
? pre_handler_kretprobe+0x4b/0x193
? stop_machine_from_inactive_cpu+0x120/0x120
_raw_spin_lock_irqsave+0x43/0x58
? pre_handler_kretprobe+0x4b/0x193
pre_handler_kretprobe+0x4b/0x193
? stop_machine_from_inactive_cpu+0x120/0x120
? kprobe_target+0x1/0x16
kprobe_int3_handler+0xd0/0x109
exc_int3+0xb8/0x10a
asm_exc_int3+0x31/0x40
RIP: 0010:kprobe_target+0x1/0x16
5d c3 cc
RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
? register_kprobe+0x43c/0x492
? stop_machine_from_inactive_cpu+0x120/0x120
? kprobe_target+0x1/0x16
? init_test_probes+0x2c6/0x38a
init_kprobes+0x193/0x19d
? debugfs_kprobe_init+0xb8/0xb8
do_one_initcall+0xf9/0x27e
? rcu_read_lock_sched_held+0x3e/0x75
? init_mm_internals+0x27b/0x284
kernel_init_freeable+0x16e/0x2b6
? rest_init+0x152/0x152
kernel_init+0xe/0x109
ret_from_fork+0x22/0x30
Kprobe smoke test: passed successfully

Config attached.

-- Steve


Attachments:
(No filename) (3.83 kB)
config (139.12 kB)
Download all attachments

2020-11-02 05:13:39

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Fri, 30 Oct 2020 21:38:31 -0400
Steven Rostedt <[email protected]> wrote:

> On Sat, 29 Aug 2020 22:02:36 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> > kretprobe from within kprobe_flush_task") sets a dummy current
> > kprobe in the trampoline handler by kprobe_busy_begin/end(),
> > it is not possible to run a kretprobe pre handler in kretprobe
> > trampoline handler context even with the NMI. If the NMI interrupts
> > a kretprobe_trampoline_handler() and it hits a kretprobe, the
> > 2nd kretprobe will detect recursion correctly and it will be
> > skipped.
> > This means we have almost no double-lock issue on kretprobes by NMI.
> >
> > The last one point is in cleanup_rp_inst() which also takes
> > kretprobe_table_lock without setting up current kprobes.
> > So adding kprobe_busy_begin/end() there allows us to remove
> > in_nmi() check.
> >
> > The above commit applies kprobe_busy_begin/end() on x86, but
> > now all arch implementation are unified to generic one, we can
> > safely remove the in_nmi() check from arch independent code.
> >
>
> So are you saying that lockdep is lying?
>
> Kprobe smoke test: started
>
> ================================
> WARNING: inconsistent lock state
> 5.10.0-rc1-test+ #29 Not tainted
> --------------------------------
> inconsistent {INITIAL USE} -> {IN-NMI} usage.
> swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
> {INITIAL USE} state was registered at:
> lock_acquire+0x280/0x325
> _raw_spin_lock+0x30/0x3f
> recycle_rp_inst+0x3f/0x86
> __kretprobe_trampoline_handler+0x13a/0x177
> trampoline_handler+0x48/0x57
> kretprobe_trampoline+0x2a/0x4f
> kretprobe_trampoline+0x0/0x4f
> init_kprobes+0x193/0x19d
> do_one_initcall+0xf9/0x27e
> kernel_init_freeable+0x16e/0x2b6
> kernel_init+0xe/0x109
> ret_from_fork+0x22/0x30
> irq event stamp: 1670
> hardirqs last enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
> hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
> softirqs last enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
> softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&rp->lock);
> <Interrupt>
> lock(&rp->lock);
>
> *** DEADLOCK ***
>
> no locks held by swapper/0/1.
>
> stack backtrace:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
> dump_stack+0x7d/0x9f
> print_usage_bug+0x1c0/0x1d3
> lock_acquire+0x302/0x325
> ? pre_handler_kretprobe+0x4b/0x193
> ? stop_machine_from_inactive_cpu+0x120/0x120
> _raw_spin_lock_irqsave+0x43/0x58
> ? pre_handler_kretprobe+0x4b/0x193
> pre_handler_kretprobe+0x4b/0x193
> ? stop_machine_from_inactive_cpu+0x120/0x120
> ? kprobe_target+0x1/0x16
> kprobe_int3_handler+0xd0/0x109
> exc_int3+0xb8/0x10a
> asm_exc_int3+0x31/0x40
> RIP: 0010:kprobe_target+0x1/0x16
> 5d c3 cc
> RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
> RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
> RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
> RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
> R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
> R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
> ? register_kprobe+0x43c/0x492
> ? stop_machine_from_inactive_cpu+0x120/0x120
> ? kprobe_target+0x1/0x16
> ? init_test_probes+0x2c6/0x38a
> init_kprobes+0x193/0x19d
> ? debugfs_kprobe_init+0xb8/0xb8
> do_one_initcall+0xf9/0x27e
> ? rcu_read_lock_sched_held+0x3e/0x75
> ? init_mm_internals+0x27b/0x284
> kernel_init_freeable+0x16e/0x2b6
> ? rest_init+0x152/0x152
> kernel_init+0xe/0x109
> ret_from_fork+0x22/0x30
> Kprobe smoke test: passed successfully
>
> Config attached.

Thanks for the report! Let me check what happen.

Thank you,

>
> -- Steve


--
Masami Hiramatsu <[email protected]>

2020-11-02 05:55:47

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Mon, 2 Nov 2020 14:11:38 +0900
Masami Hiramatsu <[email protected]> wrote:

> On Fri, 30 Oct 2020 21:38:31 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > On Sat, 29 Aug 2020 22:02:36 +0900
> > Masami Hiramatsu <[email protected]> wrote:
> >
> > > Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> > > kretprobe from within kprobe_flush_task") sets a dummy current
> > > kprobe in the trampoline handler by kprobe_busy_begin/end(),
> > > it is not possible to run a kretprobe pre handler in kretprobe
> > > trampoline handler context even with the NMI. If the NMI interrupts
> > > a kretprobe_trampoline_handler() and it hits a kretprobe, the
> > > 2nd kretprobe will detect recursion correctly and it will be
> > > skipped.
> > > This means we have almost no double-lock issue on kretprobes by NMI.
> > >
> > > The last one point is in cleanup_rp_inst() which also takes
> > > kretprobe_table_lock without setting up current kprobes.
> > > So adding kprobe_busy_begin/end() there allows us to remove
> > > in_nmi() check.
> > >
> > > The above commit applies kprobe_busy_begin/end() on x86, but
> > > now all arch implementation are unified to generic one, we can
> > > safely remove the in_nmi() check from arch independent code.
> > >
> >
> > So are you saying that lockdep is lying?
> >
> > Kprobe smoke test: started
> >
> > ================================
> > WARNING: inconsistent lock state
> > 5.10.0-rc1-test+ #29 Not tainted
> > --------------------------------
> > inconsistent {INITIAL USE} -> {IN-NMI} usage.
> > swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
> > {INITIAL USE} state was registered at:
> > lock_acquire+0x280/0x325
> > _raw_spin_lock+0x30/0x3f
> > recycle_rp_inst+0x3f/0x86
> > __kretprobe_trampoline_handler+0x13a/0x177
> > trampoline_handler+0x48/0x57
> > kretprobe_trampoline+0x2a/0x4f
> > kretprobe_trampoline+0x0/0x4f
> > init_kprobes+0x193/0x19d
> > do_one_initcall+0xf9/0x27e
> > kernel_init_freeable+0x16e/0x2b6
> > kernel_init+0xe/0x109
> > ret_from_fork+0x22/0x30
> > irq event stamp: 1670
> > hardirqs last enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
> > hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
> > softirqs last enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
> > softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&rp->lock);
> > <Interrupt>
> > lock(&rp->lock);
> >
> > *** DEADLOCK ***
> >
> > no locks held by swapper/0/1.
> >
> > stack backtrace:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
> > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> > Call Trace:
> > dump_stack+0x7d/0x9f
> > print_usage_bug+0x1c0/0x1d3
> > lock_acquire+0x302/0x325
> > ? pre_handler_kretprobe+0x4b/0x193
> > ? stop_machine_from_inactive_cpu+0x120/0x120
> > _raw_spin_lock_irqsave+0x43/0x58
> > ? pre_handler_kretprobe+0x4b/0x193
> > pre_handler_kretprobe+0x4b/0x193
> > ? stop_machine_from_inactive_cpu+0x120/0x120
> > ? kprobe_target+0x1/0x16
> > kprobe_int3_handler+0xd0/0x109
> > exc_int3+0xb8/0x10a
> > asm_exc_int3+0x31/0x40
> > RIP: 0010:kprobe_target+0x1/0x16
> > 5d c3 cc
> > RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
> > RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
> > RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
> > RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
> > R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
> > R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
> > ? register_kprobe+0x43c/0x492
> > ? stop_machine_from_inactive_cpu+0x120/0x120
> > ? kprobe_target+0x1/0x16
> > ? init_test_probes+0x2c6/0x38a
> > init_kprobes+0x193/0x19d
> > ? debugfs_kprobe_init+0xb8/0xb8
> > do_one_initcall+0xf9/0x27e
> > ? rcu_read_lock_sched_held+0x3e/0x75
> > ? init_mm_internals+0x27b/0x284
> > kernel_init_freeable+0x16e/0x2b6
> > ? rest_init+0x152/0x152
> > kernel_init+0xe/0x109
> > ret_from_fork+0x22/0x30
> > Kprobe smoke test: passed successfully
> >
> > Config attached.
>
> Thanks for the report! Let me check what happen.

OK, confirmed. But this is actually false-positive report.

The lockdep reports rp->lock case between pre_handler_kretprobe()
and recycle_rp_inst() from __kretprobe_trampoline_handler().
Since kretprobe_trampoline_handler() sets current_kprobe,
if other kprobes hits on same CPU, those are skipped. This means
pre_handler_kretprobe() is not called while executing
__kretprobe_trampoline_handler().

Actually, since this rp->lock is expected to be removed in the last
patch in this series ([21/21]), I left this as is, but we might better
to treat this case because the latter half of this series will be
merged in 5.11.

Hmm, are there any way to tell lockdep this is safe?

Thank you,


Thank you,

--
Masami Hiramatsu <[email protected]>

2020-11-02 07:04:35

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Mon, 2 Nov 2020 14:53:34 +0900
Masami Hiramatsu <[email protected]> wrote:

> On Mon, 2 Nov 2020 14:11:38 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > On Fri, 30 Oct 2020 21:38:31 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > On Sat, 29 Aug 2020 22:02:36 +0900
> > > Masami Hiramatsu <[email protected]> wrote:
> > >
> > > > Since the commit 9b38cc704e84 ("kretprobe: Prevent triggering
> > > > kretprobe from within kprobe_flush_task") sets a dummy current
> > > > kprobe in the trampoline handler by kprobe_busy_begin/end(),
> > > > it is not possible to run a kretprobe pre handler in kretprobe
> > > > trampoline handler context even with the NMI. If the NMI interrupts
> > > > a kretprobe_trampoline_handler() and it hits a kretprobe, the
> > > > 2nd kretprobe will detect recursion correctly and it will be
> > > > skipped.
> > > > This means we have almost no double-lock issue on kretprobes by NMI.
> > > >
> > > > The last one point is in cleanup_rp_inst() which also takes
> > > > kretprobe_table_lock without setting up current kprobes.
> > > > So adding kprobe_busy_begin/end() there allows us to remove
> > > > in_nmi() check.
> > > >
> > > > The above commit applies kprobe_busy_begin/end() on x86, but
> > > > now all arch implementation are unified to generic one, we can
> > > > safely remove the in_nmi() check from arch independent code.
> > > >
> > >
> > > So are you saying that lockdep is lying?
> > >
> > > Kprobe smoke test: started
> > >
> > > ================================
> > > WARNING: inconsistent lock state
> > > 5.10.0-rc1-test+ #29 Not tainted
> > > --------------------------------
> > > inconsistent {INITIAL USE} -> {IN-NMI} usage.
> > > swapper/0/1 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > > ffffffff82b07118 (&rp->lock){....}-{2:2}, at: pre_handler_kretprobe+0x4b/0x193
> > > {INITIAL USE} state was registered at:
> > > lock_acquire+0x280/0x325
> > > _raw_spin_lock+0x30/0x3f
> > > recycle_rp_inst+0x3f/0x86
> > > __kretprobe_trampoline_handler+0x13a/0x177
> > > trampoline_handler+0x48/0x57
> > > kretprobe_trampoline+0x2a/0x4f
> > > kretprobe_trampoline+0x0/0x4f
> > > init_kprobes+0x193/0x19d
> > > do_one_initcall+0xf9/0x27e
> > > kernel_init_freeable+0x16e/0x2b6
> > > kernel_init+0xe/0x109
> > > ret_from_fork+0x22/0x30
> > > irq event stamp: 1670
> > > hardirqs last enabled at (1669): [<ffffffff811cc344>] slab_free_freelist_hook+0xb4/0xfd
> > > hardirqs last disabled at (1670): [<ffffffff81da0887>] exc_int3+0xae/0x10a
> > > softirqs last enabled at (1484): [<ffffffff82000352>] __do_softirq+0x352/0x38d
> > > softirqs last disabled at (1471): [<ffffffff81e00f82>] asm_call_irq_on_stack+0x12/0x20
> > >
> > > other info that might help us debug this:
> > > Possible unsafe locking scenario:
> > >
> > > CPU0
> > > ----
> > > lock(&rp->lock);
> > > <Interrupt>
> > > lock(&rp->lock);
> > >
> > > *** DEADLOCK ***
> > >
> > > no locks held by swapper/0/1.
> > >
> > > stack backtrace:
> > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc1-test+ #29
> > > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> > > Call Trace:
> > > dump_stack+0x7d/0x9f
> > > print_usage_bug+0x1c0/0x1d3
> > > lock_acquire+0x302/0x325
> > > ? pre_handler_kretprobe+0x4b/0x193
> > > ? stop_machine_from_inactive_cpu+0x120/0x120
> > > _raw_spin_lock_irqsave+0x43/0x58
> > > ? pre_handler_kretprobe+0x4b/0x193
> > > pre_handler_kretprobe+0x4b/0x193
> > > ? stop_machine_from_inactive_cpu+0x120/0x120
> > > ? kprobe_target+0x1/0x16
> > > kprobe_int3_handler+0xd0/0x109
> > > exc_int3+0xb8/0x10a
> > > asm_exc_int3+0x31/0x40
> > > RIP: 0010:kprobe_target+0x1/0x16
> > > 5d c3 cc
> > > RSP: 0000:ffffc90000033e00 EFLAGS: 00000246
> > > RAX: ffffffff8110ea77 RBX: 0000000000000001 RCX: ffffc90000033cb4
> > > RDX: 0000000000000231 RSI: 0000000000000000 RDI: 000000003ca57c35
> > > RBP: ffffc90000033e20 R08: 0000000000000000 R09: ffffffff8111d207
> > > R10: ffff8881002ab480 R11: ffff8881002ab480 R12: 0000000000000000
> > > R13: ffffffff82a52af0 R14: 0000000000000200 R15: ffff888100331130
> > > ? register_kprobe+0x43c/0x492
> > > ? stop_machine_from_inactive_cpu+0x120/0x120
> > > ? kprobe_target+0x1/0x16
> > > ? init_test_probes+0x2c6/0x38a
> > > init_kprobes+0x193/0x19d
> > > ? debugfs_kprobe_init+0xb8/0xb8
> > > do_one_initcall+0xf9/0x27e
> > > ? rcu_read_lock_sched_held+0x3e/0x75
> > > ? init_mm_internals+0x27b/0x284
> > > kernel_init_freeable+0x16e/0x2b6
> > > ? rest_init+0x152/0x152
> > > kernel_init+0xe/0x109
> > > ret_from_fork+0x22/0x30
> > > Kprobe smoke test: passed successfully
> > >
> > > Config attached.
> >
> > Thanks for the report! Let me check what happen.
>
> OK, confirmed. But this is actually false-positive report.
>
> The lockdep reports rp->lock case between pre_handler_kretprobe()
> and recycle_rp_inst() from __kretprobe_trampoline_handler().
> Since kretprobe_trampoline_handler() sets current_kprobe,
> if other kprobes hits on same CPU, those are skipped. This means
> pre_handler_kretprobe() is not called while executing
> __kretprobe_trampoline_handler().
>
> Actually, since this rp->lock is expected to be removed in the last
> patch in this series ([21/21]), I left this as is, but we might better
> to treat this case because the latter half of this series will be
> merged in 5.11.
>
> Hmm, are there any way to tell lockdep this is safe?
>

This can supress the warnings. After introducing the lockless patch,
we don't need this anymore.


From 509b27efef8c7dbf56cab2e812916d6cd778c745 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <[email protected]>
Date: Mon, 2 Nov 2020 15:37:28 +0900
Subject: [PATCH] kprobes: Disable lockdep for kprobe busy area

Since the code area in between kprobe_busy_begin()/end() prohibits
other kprobs to call probe handlers, we can avoid inconsitent
locks there. But lockdep doesn't know that, so it warns rp->lock
or kretprobe_table_lock.

To supress those false-positive errors, disable lockdep while
kprobe_busy is set.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/kprobes.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..c7196e583600 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1295,10 +1295,12 @@ void kprobe_busy_begin(void)
__this_cpu_write(current_kprobe, &kprobe_busy);
kcb = get_kprobe_ctlblk();
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ lockdep_off();
}

void kprobe_busy_end(void)
{
+ lockdep_on();
__this_cpu_write(current_kprobe, NULL);
preempt_enable();
}
--
2.25.1


--
Masami Hiramatsu <[email protected]>

2020-11-02 14:29:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check


[ Peter Z, please take a look a this ]

On Mon, 2 Nov 2020 16:02:34 +0900
Masami Hiramatsu <[email protected]> wrote:

> >From 509b27efef8c7dbf56cab2e812916d6cd778c745 Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <[email protected]>
> Date: Mon, 2 Nov 2020 15:37:28 +0900
> Subject: [PATCH] kprobes: Disable lockdep for kprobe busy area
>
> Since the code area in between kprobe_busy_begin()/end() prohibits
> other kprobs to call probe handlers, we can avoid inconsitent
> locks there. But lockdep doesn't know that, so it warns rp->lock
> or kretprobe_table_lock.
>
> To supress those false-positive errors, disable lockdep while
> kprobe_busy is set.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> ---
> kernel/kprobes.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..c7196e583600 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1295,10 +1295,12 @@ void kprobe_busy_begin(void)
> __this_cpu_write(current_kprobe, &kprobe_busy);
> kcb = get_kprobe_ctlblk();
> kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> + lockdep_off();
> }
>
> void kprobe_busy_end(void)
> {
> + lockdep_on();
> __this_cpu_write(current_kprobe, NULL);
> preempt_enable();
> }
> --

No, this is not the correct workaround (too big of a hammer). You could do
the following:

From 4139d9c8437b0bd2262e989ca4eb0a83b7e7bb72 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <[email protected]>
Date: Mon, 2 Nov 2020 09:17:49 -0500
Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting

Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in NMI context and other context to
supress the false warnings.

Link: https://lore.kernel.org/r/[email protected]

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/kprobes.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..ccb285867059 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1249,7 +1249,12 @@ __acquires(hlist_lock)

*head = &kretprobe_inst_table[hash];
hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
}
NOKPROBE_SYMBOL(kretprobe_hash_lock);

@@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
__acquires(hlist_lock)
{
raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

@@ -2025,10 +2035,16 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
struct kretprobe *rp = container_of(p, struct kretprobe, kp);
unsigned long hash, flags = 0;
struct kretprobe_instance *ri;
+ int nmi = !!in_nmi();

/* TODO: consider to only swap the RA after the last pre_handler fired */
hash = hash_ptr(current, KPROBE_HASH_BITS);
- raw_spin_lock_irqsave(&rp->lock, flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
if (!hlist_empty(&rp->free_instances)) {
ri = hlist_entry(rp->free_instances.first,
struct kretprobe_instance, hlist);
@@ -2039,7 +2055,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
ri->task = current;

if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- raw_spin_lock_irqsave(&rp->lock, flags);
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock_irqrestore(&rp->lock, flags);
return 0;
--
2.25.4

2020-11-03 05:41:20

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Mon, 2 Nov 2020 09:27:26 -0500
Steven Rostedt <[email protected]> wrote:

>
> [ Peter Z, please take a look a this ]
>
> On Mon, 2 Nov 2020 16:02:34 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > >From 509b27efef8c7dbf56cab2e812916d6cd778c745 Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <[email protected]>
> > Date: Mon, 2 Nov 2020 15:37:28 +0900
> > Subject: [PATCH] kprobes: Disable lockdep for kprobe busy area
> >
> > Since the code area in between kprobe_busy_begin()/end() prohibits
> > other kprobs to call probe handlers, we can avoid inconsitent
> > locks there. But lockdep doesn't know that, so it warns rp->lock
> > or kretprobe_table_lock.
> >
> > To supress those false-positive errors, disable lockdep while
> > kprobe_busy is set.
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > ---
> > kernel/kprobes.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 8a12a25fa40d..c7196e583600 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1295,10 +1295,12 @@ void kprobe_busy_begin(void)
> > __this_cpu_write(current_kprobe, &kprobe_busy);
> > kcb = get_kprobe_ctlblk();
> > kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> > + lockdep_off();
> > }
> >
> > void kprobe_busy_end(void)
> > {
> > + lockdep_on();
> > __this_cpu_write(current_kprobe, NULL);
> > preempt_enable();
> > }
> > --
>
> No, this is not the correct workaround (too big of a hammer). You could do
> the following:
>
> From 4139d9c8437b0bd2262e989ca4eb0a83b7e7bb72 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (VMware)" <[email protected]>
> Date: Mon, 2 Nov 2020 09:17:49 -0500
> Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting
>
> Since the kprobe handlers have protection that prohibits other handlers from
> executing in other contexts (like if an NMI comes in while processing a
> kprobe, and executes the same kprobe, it will get fail with a "busy"
> return). Lockdep is unaware of this protection. Use lockdep's nesting api to
> differentiate between locks taken in NMI context and other context to
> supress the false warnings.

Ah, OK. This looks good to me.

BTW, in_nmi() in pre_handler_kretprobe() always be true because
now int3 is treated as an NMI. So you can always pass 1 there.

Acked-by: Masami Hiramatsu <[email protected]>

Thank you,

>
> Link: https://lore.kernel.org/r/[email protected]
>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> ---
> kernel/kprobes.c | 24 ++++++++++++++++++++----
> 1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..ccb285867059 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1249,7 +1249,12 @@ __acquires(hlist_lock)
>
> *head = &kretprobe_inst_table[hash];
> hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
> }
> NOKPROBE_SYMBOL(kretprobe_hash_lock);
>
> @@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
> __acquires(hlist_lock)
> {
> raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
> }
> NOKPROBE_SYMBOL(kretprobe_table_lock);
>
> @@ -2025,10 +2035,16 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> struct kretprobe *rp = container_of(p, struct kretprobe, kp);
> unsigned long hash, flags = 0;
> struct kretprobe_instance *ri;
> + int nmi = !!in_nmi();
>
> /* TODO: consider to only swap the RA after the last pre_handler fired */
> hash = hash_ptr(current, KPROBE_HASH_BITS);
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
> if (!hlist_empty(&rp->free_instances)) {
> ri = hlist_entry(rp->free_instances.first,
> struct kretprobe_instance, hlist);
> @@ -2039,7 +2055,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> ri->task = current;
>
> if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, nmi);
> hlist_add_head(&ri->hlist, &rp->free_instances);
> raw_spin_unlock_irqrestore(&rp->lock, flags);
> return 0;
> --
> 2.25.4
>


--
Masami Hiramatsu <[email protected]>

2020-11-03 16:11:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Tue, 3 Nov 2020 14:39:38 +0900
Masami Hiramatsu <[email protected]> wrote:

> Ah, OK. This looks good to me.
>
> BTW, in_nmi() in pre_handler_kretprobe() always be true because
> now int3 is treated as an NMI. So you can always pass 1 there.

What about the below patch then?

>
> Acked-by: Masami Hiramatsu <[email protected]>

Thanks!

From 29ac1a5c9068df06f3196173d4325c8076759551 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <[email protected]>
Date: Mon, 2 Nov 2020 09:17:49 -0500
Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting

Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.

Link: https://lore.kernel.org/r/[email protected]

Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/kprobes.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..30889ea5514f 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1249,7 +1249,12 @@ __acquires(hlist_lock)

*head = &kretprobe_inst_table[hash];
hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
}
NOKPROBE_SYMBOL(kretprobe_hash_lock);

@@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
__acquires(hlist_lock)
{
raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

@@ -2028,7 +2038,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)

/* TODO: consider to only swap the RA after the last pre_handler fired */
hash = hash_ptr(current, KPROBE_HASH_BITS);
- raw_spin_lock_irqsave(&rp->lock, flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
if (!hlist_empty(&rp->free_instances)) {
ri = hlist_entry(rp->free_instances.first,
struct kretprobe_instance, hlist);
@@ -2039,7 +2054,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
ri->task = current;

if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- raw_spin_lock_irqsave(&rp->lock, flags);
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock_irqrestore(&rp->lock, flags);
return 0;
--
2.25.4

2020-11-04 02:11:36

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Tue, 3 Nov 2020 11:09:13 -0500
Steven Rostedt <[email protected]> wrote:

> On Tue, 3 Nov 2020 14:39:38 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Ah, OK. This looks good to me.
> >
> > BTW, in_nmi() in pre_handler_kretprobe() always be true because
> > now int3 is treated as an NMI. So you can always pass 1 there.
>
> What about the below patch then?

kretprobe_hash_lock() and kretprobe_table_lock() will be called from
outside of the kprobe pre_handler context. So, please keep in_nmi()
in those functions.
for the pre_handler_kretprobe(), this looks good to me.

Thank you,

> > Acked-by: Masami Hiramatsu <[email protected]>
>
> Thanks!
>
> From 29ac1a5c9068df06f3196173d4325c8076759551 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (VMware)" <[email protected]>
> Date: Mon, 2 Nov 2020 09:17:49 -0500
> Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting
>
> Since the kprobe handlers have protection that prohibits other handlers from
> executing in other contexts (like if an NMI comes in while processing a
> kprobe, and executes the same kprobe, it will get fail with a "busy"
> return). Lockdep is unaware of this protection. Use lockdep's nesting api to
> differentiate between locks taken in INT3 context and other context to
> suppress the false warnings.
>
> Link: https://lore.kernel.org/r/[email protected]
>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> ---
> kernel/kprobes.c | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..30889ea5514f 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1249,7 +1249,12 @@ __acquires(hlist_lock)
>
> *head = &kretprobe_inst_table[hash];
> hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
> }
> NOKPROBE_SYMBOL(kretprobe_hash_lock);
>
> @@ -1258,7 +1263,12 @@ static void kretprobe_table_lock(unsigned long hash,
> __acquires(hlist_lock)
> {
> raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, 1);
> }
> NOKPROBE_SYMBOL(kretprobe_table_lock);
>
> @@ -2028,7 +2038,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>
> /* TODO: consider to only swap the RA after the last pre_handler fired */
> hash = hash_ptr(current, KPROBE_HASH_BITS);
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
> if (!hlist_empty(&rp->free_instances)) {
> ri = hlist_entry(rp->free_instances.first,
> struct kretprobe_instance, hlist);
> @@ -2039,7 +2054,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> ri->task = current;
>
> if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
> hlist_add_head(&ri->hlist, &rp->free_instances);
> raw_spin_unlock_irqrestore(&rp->lock, flags);
> return 0;
> --
> 2.25.4
>


--
Masami Hiramatsu <[email protected]>

2020-11-04 14:51:02

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Wed, 4 Nov 2020 11:08:52 +0900
Masami Hiramatsu <[email protected]> wrote:

> kretprobe_hash_lock() and kretprobe_table_lock() will be called from
> outside of the kprobe pre_handler context. So, please keep in_nmi()
> in those functions.
> for the pre_handler_kretprobe(), this looks good to me.
>

Final version, before sending to Linus.

-- Steve

From: "Steven Rostedt (VMware)" <[email protected]>
Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting

Since the kprobe handlers have protection that prohibits other handlers from
executing in other contexts (like if an NMI comes in while processing a
kprobe, and executes the same kprobe, it will get fail with a "busy"
return). Lockdep is unaware of this protection. Use lockdep's nesting api to
differentiate between locks taken in INT3 context and other context to
suppress the false warnings.

Link: https://lore.kernel.org/r/[email protected]

Cc: Peter Zijlstra <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/kprobes.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8a12a25fa40d..41fdbb7953c6 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1249,7 +1249,13 @@ __acquires(hlist_lock)

*head = &kretprobe_inst_table[hash];
hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ * Differentiate when it is taken in NMI context.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
}
NOKPROBE_SYMBOL(kretprobe_hash_lock);

@@ -1258,7 +1264,13 @@ static void kretprobe_table_lock(unsigned long hash,
__acquires(hlist_lock)
{
raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
- raw_spin_lock_irqsave(hlist_lock, *flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ * Differentiate when it is taken in NMI context.
+ */
+ raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
}
NOKPROBE_SYMBOL(kretprobe_table_lock);

@@ -2028,7 +2040,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)

/* TODO: consider to only swap the RA after the last pre_handler fired */
hash = hash_ptr(current, KPROBE_HASH_BITS);
- raw_spin_lock_irqsave(&rp->lock, flags);
+ /*
+ * Nested is a workaround that will soon not be needed.
+ * There's other protections that make sure the same lock
+ * is not taken on the same CPU that lockdep is unaware of.
+ */
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
if (!hlist_empty(&rp->free_instances)) {
ri = hlist_entry(rp->free_instances.first,
struct kretprobe_instance, hlist);
@@ -2039,7 +2056,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
ri->task = current;

if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- raw_spin_lock_irqsave(&rp->lock, flags);
+ raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
hlist_add_head(&ri->hlist, &rp->free_instances);
raw_spin_unlock_irqrestore(&rp->lock, flags);
return 0;
--
2.25.4

2020-11-05 05:46:07

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 14/21] kprobes: Remove NMI context check

On Wed, 4 Nov 2020 09:47:22 -0500
Steven Rostedt <[email protected]> wrote:

> On Wed, 4 Nov 2020 11:08:52 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > kretprobe_hash_lock() and kretprobe_table_lock() will be called from
> > outside of the kprobe pre_handler context. So, please keep in_nmi()
> > in those functions.
> > for the pre_handler_kretprobe(), this looks good to me.
> >
>
> Final version, before sending to Linus.

This looks good to me :)

Thank you!

>
> -- Steve
>
> From: "Steven Rostedt (VMware)" <[email protected]>
> Subject: [PATCH] kprobes: Tell lockdep about kprobe nesting
>
> Since the kprobe handlers have protection that prohibits other handlers from
> executing in other contexts (like if an NMI comes in while processing a
> kprobe, and executes the same kprobe, it will get fail with a "busy"
> return). Lockdep is unaware of this protection. Use lockdep's nesting api to
> differentiate between locks taken in INT3 context and other context to
> suppress the false warnings.
>
> Link: https://lore.kernel.org/r/[email protected]
>
> Cc: Peter Zijlstra <[email protected]>
> Acked-by: Masami Hiramatsu <[email protected]>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> ---
> kernel/kprobes.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 8a12a25fa40d..41fdbb7953c6 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1249,7 +1249,13 @@ __acquires(hlist_lock)
>
> *head = &kretprobe_inst_table[hash];
> hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + * Differentiate when it is taken in NMI context.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
> }
> NOKPROBE_SYMBOL(kretprobe_hash_lock);
>
> @@ -1258,7 +1264,13 @@ static void kretprobe_table_lock(unsigned long hash,
> __acquires(hlist_lock)
> {
> raw_spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
> - raw_spin_lock_irqsave(hlist_lock, *flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + * Differentiate when it is taken in NMI context.
> + */
> + raw_spin_lock_irqsave_nested(hlist_lock, *flags, !!in_nmi());
> }
> NOKPROBE_SYMBOL(kretprobe_table_lock);
>
> @@ -2028,7 +2040,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
>
> /* TODO: consider to only swap the RA after the last pre_handler fired */
> hash = hash_ptr(current, KPROBE_HASH_BITS);
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + /*
> + * Nested is a workaround that will soon not be needed.
> + * There's other protections that make sure the same lock
> + * is not taken on the same CPU that lockdep is unaware of.
> + */
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
> if (!hlist_empty(&rp->free_instances)) {
> ri = hlist_entry(rp->free_instances.first,
> struct kretprobe_instance, hlist);
> @@ -2039,7 +2056,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> ri->task = current;
>
> if (rp->entry_handler && rp->entry_handler(ri, regs)) {
> - raw_spin_lock_irqsave(&rp->lock, flags);
> + raw_spin_lock_irqsave_nested(&rp->lock, flags, 1);
> hlist_add_head(&ri->hlist, &rp->free_instances);
> raw_spin_unlock_irqrestore(&rp->lock, flags);
> return 0;
> --
> 2.25.4
>


--
Masami Hiramatsu <[email protected]>