Hi,
Here is batch optimizing patch series version 4.
Since current kprobes jump optimization calls stop_machine() for each
probe, it can make a lot of latency noises when (un)registering a lot of
probes (~1000) at once. For example, on 4 core Xeon, 256 probes
optimization takes 770us in average (max 3.3ms).
This patch series introduces batch (un)optimization which modifies
code with just one stop_machine(), and it improves optimization time
to 90us in average (max 330us).
- Introduce text_poke_smp_batch() which modifies multiple
codes with one stop_machine().
- Limit how many probes can be (un)optimized at once.
- Introduce delayed unoptimization for batch processing.
text_poke_smp_batch() will also help Jump label to reduce
its delay coming from text_poke_smp().
Changes in v4:
- Update to the latest tip tree.
Changes in v3:
- Set kp.addr = NULL according to Ananth's comment.
Changes in v2:
- Add kprobes selftest bugfix patch.
- Add some comments about locks according to Mathieu's comment.
- Allocate working buffers when initializing kprobes, instead of
using static arraies.
- Merge max optimization limit patch into batch optimizing patch.
Thank you,
---
Masami Hiramatsu (3):
kprobes: Support delayed unoptimization
kprobes/x86: Use text_poke_smp_batch
x86: Introduce text_poke_smp_batch() for batch-code modifying
arch/x86/include/asm/alternative.h | 7 +
arch/x86/include/asm/kprobes.h | 4
arch/x86/kernel/alternative.c | 49 +++-
arch/x86/kernel/kprobes.c | 114 +++++++++
include/linux/kprobes.h | 4
kernel/kprobes.c | 435 ++++++++++++++++++++++++------------
6 files changed, 451 insertions(+), 162 deletions(-)
--
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]
Introduce text_poke_smp_batch(). This function modifies several
text areas with one stop_machine() on SMP. Because calling
stop_machine() is heavy task, it is better to aggregate text_poke
requests.
Frederic, I've talked with Rusty about this interface, and
he would not like to expand stop_machine() interface, since
it is not for generic use.
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Jan Beulich <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: [email protected]
Cc: Rusty Russell <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
arch/x86/include/asm/alternative.h | 7 +++++
arch/x86/kernel/alternative.c | 49 +++++++++++++++++++++++++++++-------
2 files changed, 47 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 76561d2..4a2adaa 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -180,8 +180,15 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
* On the local CPU you need to be protected again NMI or MCE handlers seeing an
* inconsistent instruction while you patch.
*/
+struct text_poke_param {
+ void *addr;
+ const void *opcode;
+ size_t len;
+};
+
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
+extern void text_poke_smp_batch(struct text_poke_param *params, int n);
#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
#define IDEAL_NOP_SIZE_5 5
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5079f24..553d0b0 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -591,17 +591,21 @@ static atomic_t stop_machine_first;
static int wrote_text;
struct text_poke_params {
- void *addr;
- const void *opcode;
- size_t len;
+ struct text_poke_param *params;
+ int nparams;
};
static int __kprobes stop_machine_text_poke(void *data)
{
struct text_poke_params *tpp = data;
+ struct text_poke_param *p;
+ int i;
if (atomic_dec_and_test(&stop_machine_first)) {
- text_poke(tpp->addr, tpp->opcode, tpp->len);
+ for (i = 0; i < tpp->nparams; i++) {
+ p = &tpp->params[i];
+ text_poke(p->addr, p->opcode, p->len);
+ }
smp_wmb(); /* Make sure other cpus see that this has run */
wrote_text = 1;
} else {
@@ -610,8 +614,12 @@ static int __kprobes stop_machine_text_poke(void *data)
smp_mb(); /* Load wrote_text before following execution */
}
- flush_icache_range((unsigned long)tpp->addr,
- (unsigned long)tpp->addr + tpp->len);
+ for (i = 0; i < tpp->nparams; i++) {
+ p = &tpp->params[i];
+ flush_icache_range((unsigned long)p->addr,
+ (unsigned long)p->addr + p->len);
+ }
+
return 0;
}
@@ -631,10 +639,13 @@ static int __kprobes stop_machine_text_poke(void *data)
void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
{
struct text_poke_params tpp;
+ struct text_poke_param p;
- tpp.addr = addr;
- tpp.opcode = opcode;
- tpp.len = len;
+ p.addr = addr;
+ p.opcode = opcode;
+ p.len = len;
+ tpp.params = &p;
+ tpp.nparams = 1;
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
/* Use __stop_machine() because the caller already got online_cpus. */
@@ -642,6 +653,26 @@ void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
return addr;
}
+/**
+ * text_poke_smp_batch - Update instructions on a live kernel on SMP
+ * @params: an array of text_poke parameters
+ * @n: the number of elements in params.
+ *
+ * Modify multi-byte instruction by using stop_machine() on SMP. Since the
+ * stop_machine() is heavy task, it is better to aggregate text_poke requests
+ * and do it once if possible.
+ *
+ * Note: Must be called under get_online_cpus() and text_mutex.
+ */
+void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)
+{
+ struct text_poke_params tpp = {.params = params, .nparams = n};
+
+ atomic_set(&stop_machine_first, 1);
+ wrote_text = 0;
+ stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
+}
+
#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
#ifdef CONFIG_X86_64
Use text_poke_smp_batch() in optimization path for reducing
the number of stop_machine() issues. If the number of optimizing
probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes
first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining
probes.
Changes in v2:
- Allocate code buffer and parameters in arch_init_kprobes()
instead of using static arraies.
- Merge previous max optimization limit patch into this patch.
So, this patch introduces upper limit of optimization at
once.
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Peter Zijlstra <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: Rusty Russell <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/kprobes.c | 69 ++++++++++++++++++++++++++++++++++++++++-----
include/linux/kprobes.h | 2 +
kernel/kprobes.c | 11 ++-----
3 files changed, 65 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 1cbd54c..2840d34 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -1401,10 +1401,16 @@ int __kprobes arch_prepare_optimized_kprobe(struct optimized_kprobe *op)
return 0;
}
-/* Replace a breakpoint (int3) with a relative jump. */
-int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op)
+#define MAX_OPTIMIZE_PROBES 256
+static struct text_poke_param *jump_poke_params;
+static struct jump_poke_buffer {
+ u8 buf[RELATIVEJUMP_SIZE];
+} *jump_poke_bufs;
+
+static void __kprobes setup_optimize_kprobe(struct text_poke_param *tprm,
+ u8 *insn_buf,
+ struct optimized_kprobe *op)
{
- unsigned char jmp_code[RELATIVEJUMP_SIZE];
s32 rel = (s32)((long)op->optinsn.insn -
((long)op->kp.addr + RELATIVEJUMP_SIZE));
@@ -1412,16 +1418,39 @@ int __kprobes arch_optimize_kprobe(struct optimized_kprobe *op)
memcpy(op->optinsn.copied_insn, op->kp.addr + INT3_SIZE,
RELATIVE_ADDR_SIZE);
- jmp_code[0] = RELATIVEJUMP_OPCODE;
- *(s32 *)(&jmp_code[1]) = rel;
+ insn_buf[0] = RELATIVEJUMP_OPCODE;
+ *(s32 *)(&insn_buf[1]) = rel;
+
+ tprm->addr = op->kp.addr;
+ tprm->opcode = insn_buf;
+ tprm->len = RELATIVEJUMP_SIZE;
+}
+
+/*
+ * Replace breakpoints (int3) with relative jumps.
+ * Caller must call with locking kprobe_mutex and text_mutex.
+ */
+void __kprobes arch_optimize_kprobes(struct list_head *oplist)
+{
+ struct optimized_kprobe *op, *tmp;
+ int c = 0;
+
+ list_for_each_entry_safe(op, tmp, oplist, list) {
+ WARN_ON(kprobe_disabled(&op->kp));
+ /* Setup param */
+ setup_optimize_kprobe(&jump_poke_params[c],
+ jump_poke_bufs[c].buf, op);
+ list_del_init(&op->list);
+ if (++c >= MAX_OPTIMIZE_PROBES)
+ break;
+ }
/*
* text_poke_smp doesn't support NMI/MCE code modifying.
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
- text_poke_smp(op->kp.addr, jmp_code, RELATIVEJUMP_SIZE);
- return 0;
+ text_poke_smp_batch(jump_poke_params, c);
}
/* Replace a relative jump with a breakpoint (int3). */
@@ -1453,11 +1482,35 @@ static int __kprobes setup_detour_execution(struct kprobe *p,
}
return 0;
}
+
+static int __kprobes init_poke_params(void)
+{
+ /* Allocate code buffer and parameter array */
+ jump_poke_bufs = kmalloc(sizeof(struct jump_poke_buffer) *
+ MAX_OPTIMIZE_PROBES, GFP_KERNEL);
+ if (!jump_poke_bufs)
+ return -ENOMEM;
+
+ jump_poke_params = kmalloc(sizeof(struct text_poke_param) *
+ MAX_OPTIMIZE_PROBES, GFP_KERNEL);
+ if (!jump_poke_params) {
+ kfree(jump_poke_bufs);
+ jump_poke_bufs = NULL;
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+#else /* !CONFIG_OPTPROBES */
+static int __kprobes init_poke_params(void)
+{
+ return 0;
+}
#endif
int __init arch_init_kprobes(void)
{
- return 0;
+ return init_poke_params();
}
int __kprobes arch_trampoline_kprobe(struct kprobe *p)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e7d1b2e..fe157ba 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -275,7 +275,7 @@ extern int arch_prepared_optinsn(struct arch_optimized_insn *optinsn);
extern int arch_check_optimized_kprobe(struct optimized_kprobe *op);
extern int arch_prepare_optimized_kprobe(struct optimized_kprobe *op);
extern void arch_remove_optimized_kprobe(struct optimized_kprobe *op);
-extern int arch_optimize_kprobe(struct optimized_kprobe *op);
+extern void arch_optimize_kprobes(struct list_head *oplist);
extern void arch_unoptimize_kprobe(struct optimized_kprobe *op);
extern kprobe_opcode_t *get_optinsn_slot(void);
extern void free_optinsn_slot(kprobe_opcode_t *slot, int dirty);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 9737a76..4db766f 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -430,8 +430,6 @@ static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
/* Kprobe jump optimizer */
static __kprobes void kprobe_optimizer(struct work_struct *work)
{
- struct optimized_kprobe *op, *tmp;
-
/* Lock modules while optimizing kprobes */
mutex_lock(&module_mutex);
mutex_lock(&kprobe_mutex);
@@ -459,14 +457,11 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
*/
get_online_cpus();
mutex_lock(&text_mutex);
- list_for_each_entry_safe(op, tmp, &optimizing_list, list) {
- WARN_ON(kprobe_disabled(&op->kp));
- if (arch_optimize_kprobe(op) < 0)
- op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
- list_del_init(&op->list);
- }
+ arch_optimize_kprobes(&optimizing_list);
mutex_unlock(&text_mutex);
put_online_cpus();
+ if (!list_empty(&optimizing_list))
+ schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
end:
mutex_unlock(&kprobe_mutex);
mutex_unlock(&module_mutex);
Unoptimization occurs when a probe is unregistered or disabled, and
is heavy because it recovers instructions by using stop_machine().
This patch delays unoptimization operations and unoptimize several
probes at once by using text_poke_smp_batch().
This can avoid unexpected system slowdown coming from stop_machine().
Changes in v2:
- Use dynamic allocated buffers and params.
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]
Cc: Mathieu Desnoyers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: Rusty Russell <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
arch/x86/include/asm/kprobes.h | 4
arch/x86/kernel/kprobes.c | 45 ++++
include/linux/kprobes.h | 2
kernel/kprobes.c | 434 +++++++++++++++++++++++++++-------------
4 files changed, 344 insertions(+), 141 deletions(-)
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 5478825..e8a864c 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -79,12 +79,12 @@ struct arch_specific_insn {
};
struct arch_optimized_insn {
- /* copy of the original instructions */
- kprobe_opcode_t copied_insn[RELATIVE_ADDR_SIZE];
/* detour code buffer */
kprobe_opcode_t *insn;
/* the size of instructions copied to detour code buffer */
size_t size;
+ /* copy of the original instructions */
+ kprobe_opcode_t copied_insn[RELATIVE_ADDR_SIZE];
};
/* Return true (!0) if optinsn is prepared for optimization. */
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index 2840d34..d384b49 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -1184,6 +1184,10 @@ static void __kprobes optimized_callback(struct optimized_kprobe *op,
{
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
+ /* This is possible if op is under delayed unoptimizing */
+ if (kprobe_disabled(&op->kp))
+ return;
+
preempt_disable();
if (kprobe_running()) {
kprobes_inc_nmissed_count(&op->kp);
@@ -1453,6 +1457,47 @@ void __kprobes arch_optimize_kprobes(struct list_head *oplist)
text_poke_smp_batch(jump_poke_params, c);
}
+static void __kprobes setup_unoptimize_kprobe(struct text_poke_param *tprm,
+ u8 *insn_buf,
+ struct optimized_kprobe *op)
+{
+ /* Set int3 to first byte for kprobes */
+ insn_buf[0] = BREAKPOINT_INSTRUCTION;
+ memcpy(insn_buf + 1, op->optinsn.copied_insn, RELATIVE_ADDR_SIZE);
+
+ tprm->addr = op->kp.addr;
+ tprm->opcode = insn_buf;
+ tprm->len = RELATIVEJUMP_SIZE;
+}
+
+/*
+ * Recover original instructions and breakpoints from relative jumps.
+ * Caller must call with locking kprobe_mutex.
+ */
+extern void arch_unoptimize_kprobes(struct list_head *oplist,
+ struct list_head *done_list)
+{
+ struct optimized_kprobe *op, *tmp;
+ int c = 0;
+
+ list_for_each_entry_safe(op, tmp, oplist, list) {
+ /* Setup param */
+ setup_unoptimize_kprobe(&jump_poke_params[c],
+ jump_poke_bufs[c].buf, op);
+ list_del_init(&op->list);
+ list_add(&op->list, done_list);
+ if (++c >= MAX_OPTIMIZE_PROBES)
+ break;
+ }
+
+ /*
+ * text_poke_smp doesn't support NMI/MCE code modifying.
+ * However, since kprobes itself also doesn't support NMI/MCE
+ * code probing, it's not a problem.
+ */
+ text_poke_smp_batch(jump_poke_params, c);
+}
+
/* Replace a relative jump with a breakpoint (int3). */
void __kprobes arch_unoptimize_kprobe(struct optimized_kprobe *op)
{
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index fe157ba..b78edb5 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -276,6 +276,8 @@ extern int arch_check_optimized_kprobe(struct optimized_kprobe *op);
extern int arch_prepare_optimized_kprobe(struct optimized_kprobe *op);
extern void arch_remove_optimized_kprobe(struct optimized_kprobe *op);
extern void arch_optimize_kprobes(struct list_head *oplist);
+extern void arch_unoptimize_kprobes(struct list_head *oplist,
+ struct list_head *done_list);
extern void arch_unoptimize_kprobe(struct optimized_kprobe *op);
extern kprobe_opcode_t *get_optinsn_slot(void);
extern void free_optinsn_slot(kprobe_opcode_t *slot, int dirty);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 4db766f..1bfddc1 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -354,6 +354,13 @@ static inline int kprobe_aggrprobe(struct kprobe *p)
return p->pre_handler == aggr_pre_handler;
}
+/* Return true(!0) if the kprobe is unused */
+static inline int kprobe_unused(struct kprobe *p)
+{
+ return kprobe_aggrprobe(p) && kprobe_disabled(p) &&
+ list_empty(&p->list);
+}
+
/*
* Keep all fields in the kprobe consistent
*/
@@ -384,6 +391,17 @@ void __kprobes opt_pre_handler(struct kprobe *p, struct pt_regs *regs)
}
}
+/* Free optimized instructions and optimized_kprobe */
+static __kprobes void free_aggr_kprobe(struct kprobe *p)
+{
+ struct optimized_kprobe *op;
+
+ op = container_of(p, struct optimized_kprobe, kp);
+ arch_remove_optimized_kprobe(op);
+ arch_remove_kprobe(p);
+ kfree(op);
+}
+
/* Return true(!0) if the kprobe is ready for optimization. */
static inline int kprobe_optready(struct kprobe *p)
{
@@ -397,6 +415,20 @@ static inline int kprobe_optready(struct kprobe *p)
return 0;
}
+/* Return true(!0) if the kprobe is disarmed. Note: p must be on hash list */
+static inline int kprobe_disarmed(struct kprobe *p)
+{
+ struct optimized_kprobe *op;
+
+ /* If kprobe is not aggr/opt probe, just return kprobe is disabled */
+ if (!kprobe_aggrprobe(p))
+ return kprobe_disabled(p);
+
+ op = container_of(p, struct optimized_kprobe, kp);
+
+ return kprobe_disabled(p) && list_empty(&op->list);
+}
+
/*
* Return an optimized kprobe whose optimizing code replaces
* instructions including addr (exclude breakpoint).
@@ -422,29 +454,19 @@ static struct kprobe *__kprobes get_optimized_kprobe(unsigned long addr)
/* Optimization staging list, protected by kprobe_mutex */
static LIST_HEAD(optimizing_list);
+static LIST_HEAD(unoptimizing_list);
static void kprobe_optimizer(struct work_struct *work);
static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
#define OPTIMIZE_DELAY 5
+static DECLARE_COMPLETION(optimizer_comp);
-/* Kprobe jump optimizer */
-static __kprobes void kprobe_optimizer(struct work_struct *work)
+static __kprobes void do_unoptimize_kprobes(struct list_head *free_list)
{
- /* Lock modules while optimizing kprobes */
- mutex_lock(&module_mutex);
- mutex_lock(&kprobe_mutex);
- if (kprobes_all_disarmed || !kprobes_allow_optimization)
- goto end;
-
- /*
- * Wait for quiesence period to ensure all running interrupts
- * are done. Because optprobe may modify multiple instructions
- * there is a chance that Nth instruction is interrupted. In that
- * case, running interrupt can return to 2nd-Nth byte of jump
- * instruction. This wait is for avoiding it.
- */
- synchronize_sched();
+ struct optimized_kprobe *op, *tmp;
+ if (list_empty(&unoptimizing_list))
+ return;
/*
* The optimization/unoptimization refers online_cpus via
* stop_machine() and cpu-hotplug modifies online_cpus.
@@ -457,14 +479,93 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
*/
get_online_cpus();
mutex_lock(&text_mutex);
+ /* Do batch unoptimization */
+ arch_unoptimize_kprobes(&unoptimizing_list, free_list);
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+
+ list_for_each_entry_safe(op, tmp, free_list, list) {
+ /* Disarm probes if disabled */
+ if (kprobe_disabled(&op->kp))
+ arch_disarm_kprobe(&op->kp);
+ /* leave only unused probes on free_list */
+ if (!kprobe_unused(&op->kp))
+ list_del_init(&op->list);
+ else
+ /*
+ * Remove unused probe from hash list. After waiting
+ * for synchronization, this probe can be freed safely
+ * (which will be freed by do_free_unused_kprobes.)
+ */
+ hlist_del_rcu(&op->kp.hlist);
+ }
+}
+
+static __kprobes void do_free_unused_kprobes(struct list_head *free_list)
+{
+ struct optimized_kprobe *op, *tmp;
+
+ list_for_each_entry_safe(op, tmp, free_list, list) {
+ list_del_init(&op->list);
+ free_aggr_kprobe(&op->kp);
+ }
+}
+
+static __kprobes void do_optimize_kprobes(void)
+{
+ /* Optimization never allowed when disarmed */
+ if (kprobes_all_disarmed || !kprobes_allow_optimization ||
+ list_empty(&optimizing_list))
+ return;
+
+ get_online_cpus();
+ mutex_lock(&text_mutex);
arch_optimize_kprobes(&optimizing_list);
mutex_unlock(&text_mutex);
put_online_cpus();
- if (!list_empty(&optimizing_list))
- schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
-end:
+}
+
+/* Kprobe jump optimize/unoptimize and collect unused aggrprobes */
+static __kprobes void kprobe_optimizer(struct work_struct *work)
+{
+ LIST_HEAD(free_list);
+
+ /* Lock modules while optimizing kprobes */
+ mutex_lock(&module_mutex);
+ mutex_lock(&kprobe_mutex);
+
+ do_unoptimize_kprobes(&free_list);
+
+ /*
+ * Wait for quiesence period to ensure all running interrupts
+ * are done. Because optprobe may modify multiple instructions
+ * there is a chance that Nth instruction is interrupted. In that
+ * case, running interrupt can return to 2nd-Nth byte of jump
+ * instruction. This wait is for avoiding it.
+ */
+ synchronize_sched();
+
+ do_optimize_kprobes();
+
+ /* Release all releasable probes */
+ do_free_unused_kprobes(&free_list);
+
mutex_unlock(&kprobe_mutex);
mutex_unlock(&module_mutex);
+
+ if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list)) {
+ if (!delayed_work_pending(&optimizing_work))
+ schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
+ } else
+ /* Wake up all waiters */
+ complete_all(&optimizer_comp);
+}
+
+/* Wait for completion */
+static __kprobes void wait_for_optimizer(void)
+{
+ if (delayed_work_pending(&optimizing_work))
+ wait_for_completion(&optimizer_comp);
}
/* Optimize kprobe if p is ready to be optimized */
@@ -477,6 +578,10 @@ static __kprobes void optimize_kprobe(struct kprobe *p)
(kprobe_disabled(p) || kprobes_all_disarmed))
return;
+ /* Check if it is already optimized. */
+ if (kprobe_optimized(p))
+ return;
+
/* Both of break_handler and post_handler are not supported. */
if (p->break_handler || p->post_handler)
return;
@@ -487,45 +592,96 @@ static __kprobes void optimize_kprobe(struct kprobe *p)
if (arch_check_optimized_kprobe(op) < 0)
return;
- /* Check if it is already optimized. */
- if (op->kp.flags & KPROBE_FLAG_OPTIMIZED)
- return;
-
op->kp.flags |= KPROBE_FLAG_OPTIMIZED;
- list_add(&op->list, &optimizing_list);
- if (!delayed_work_pending(&optimizing_work))
- schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
+ if (!list_empty(&op->list))
+ /* Unoptimizing probe: dequeue probe */
+ list_del_init(&op->list);
+ else {
+ list_add(&op->list, &optimizing_list);
+ if (!delayed_work_pending(&optimizing_work))
+ schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
+ }
+}
+
+static __kprobes void force_unoptimize_kprobe(struct optimized_kprobe *op)
+{
+ get_online_cpus();
+ arch_unoptimize_kprobe(op);
+ put_online_cpus();
+ if (kprobe_disabled(&op->kp))
+ arch_disarm_kprobe(&op->kp);
}
-/* Unoptimize a kprobe if p is optimized */
-static __kprobes void unoptimize_kprobe(struct kprobe *p)
+/* Unoptimize a kprobe if p is optimized. */
+static __kprobes void unoptimize_kprobe(struct kprobe *p, int force)
{
struct optimized_kprobe *op;
- if ((p->flags & KPROBE_FLAG_OPTIMIZED) && kprobe_aggrprobe(p)) {
- op = container_of(p, struct optimized_kprobe, kp);
- if (!list_empty(&op->list))
- /* Dequeue from the optimization queue */
+ if (!kprobe_aggrprobe(p) || kprobe_disarmed(p))
+ return; /* This is not an optprobe nor optimized */
+
+ op = container_of(p, struct optimized_kprobe, kp);
+ if (!kprobe_optimized(p)) {
+ /* Unoptimized or unoptimizing */
+ if (force && !list_empty(&op->list)) {
+ /*
+ * If unoptimizing probe and forced option, forcibly
+ * unoptimize it.
+ */
list_del_init(&op->list);
- else
- /* Replace jump with break */
- arch_unoptimize_kprobe(op);
- op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
+ force_unoptimize_kprobe(op);
+ }
+ return;
+ }
+
+ op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
+ if (!list_empty(&op->list)) {
+ /* Dequeue from the optimization queue */
+ list_del_init(&op->list);
+ return;
+ }
+
+ if (force) /* Forcibly update the code: this is a special case */
+ force_unoptimize_kprobe(op);
+ else {
+ list_add(&op->list, &unoptimizing_list);
+ if (!delayed_work_pending(&optimizing_work))
+ schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
}
}
+/* Cancel unoptimizing for reusing */
+static void reuse_unused_kprobe(struct kprobe *ap)
+{
+ struct optimized_kprobe *op;
+
+ BUG_ON(!kprobe_unused(ap));
+ /*
+ * Unused kprobe MUST be on the way of delayed unoptimizing (means
+ * there is still a relative jump) and disabled.
+ */
+ op = container_of(ap, struct optimized_kprobe, kp);
+ if (unlikely(list_empty(&op->list)))
+ printk(KERN_WARNING "Warning: found a stray unused "
+ "aggrprobe@%p\n", ap->addr);
+ /* Enable the probe again */
+ ap->flags &= ~KPROBE_FLAG_DISABLED;
+ /* Optimize it again (remove from op->list) */
+ BUG_ON(!kprobe_optready(ap));
+ optimize_kprobe(ap);
+}
+
/* Remove optimized instructions */
static void __kprobes kill_optimized_kprobe(struct kprobe *p)
{
struct optimized_kprobe *op;
op = container_of(p, struct optimized_kprobe, kp);
- if (!list_empty(&op->list)) {
- /* Dequeue from the optimization queue */
+ if (!list_empty(&op->list))
+ /* Dequeue from the (un)optimization queue */
list_del_init(&op->list);
- op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
- }
- /* Don't unoptimize, because the target code will be freed. */
+ op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
+ /* Don't unoptimize, because the target code is already freed. */
arch_remove_optimized_kprobe(op);
}
@@ -538,16 +694,6 @@ static __kprobes void prepare_optimized_kprobe(struct kprobe *p)
arch_prepare_optimized_kprobe(op);
}
-/* Free optimized instructions and optimized_kprobe */
-static __kprobes void free_aggr_kprobe(struct kprobe *p)
-{
- struct optimized_kprobe *op;
-
- op = container_of(p, struct optimized_kprobe, kp);
- arch_remove_optimized_kprobe(op);
- kfree(op);
-}
-
/* Allocate new optimized_kprobe and try to prepare optimized instructions */
static __kprobes struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
{
@@ -627,20 +773,15 @@ static void __kprobes unoptimize_all_kprobes(void)
kprobes_allow_optimization = false;
printk(KERN_INFO "Kprobes globally unoptimized\n");
- get_online_cpus(); /* For avoiding text_mutex deadlock */
- mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
if (!kprobe_disabled(p))
- unoptimize_kprobe(p);
+ unoptimize_kprobe(p, 0);
}
}
-
- mutex_unlock(&text_mutex);
- put_online_cpus();
- /* Allow all currently running kprobes to complete */
- synchronize_sched();
+ /* Wait for unoptimizing completion */
+ wait_for_optimizer();
}
int sysctl_kprobes_optimization;
@@ -671,37 +812,64 @@ static void __kprobes __arm_kprobe(struct kprobe *p)
/* Check collision with other optimized kprobes */
old_p = get_optimized_kprobe((unsigned long)p->addr);
if (unlikely(old_p))
- unoptimize_kprobe(old_p); /* Fallback to unoptimized kprobe */
+ unoptimize_kprobe(old_p, 1);/* Fallback to unoptimized kprobe */
arch_arm_kprobe(p);
optimize_kprobe(p); /* Try to optimize (add kprobe to a list) */
}
+/* Return true(!0) if the probe is queued on (un)optimizing lists */
+static int __kprobes kprobe_queued(struct kprobe *p)
+{
+ struct optimized_kprobe *op;
+
+ if (kprobe_aggrprobe(p)) {
+ op = container_of(p, struct optimized_kprobe, kp);
+ if (!list_empty(&op->list))
+ return 1;
+ }
+ return 0;
+}
+
static void __kprobes __disarm_kprobe(struct kprobe *p)
{
struct kprobe *old_p;
- unoptimize_kprobe(p); /* Try to unoptimize */
- arch_disarm_kprobe(p);
+ /* Delayed unoptimizing if needed (add kprobe to a list) */
+ if (kprobe_optimized(p))
+ unoptimize_kprobe(p, 0);
- /* If another kprobe was blocked, optimize it. */
- old_p = get_optimized_kprobe((unsigned long)p->addr);
- if (unlikely(old_p))
- optimize_kprobe(old_p);
+ if (!kprobe_queued(p)) {
+ arch_disarm_kprobe(p);
+ /* If another kprobe was blocked, optimize it. */
+ old_p = get_optimized_kprobe((unsigned long)p->addr);
+ if (unlikely(old_p))
+ optimize_kprobe(old_p);
+ }
}
#else /* !CONFIG_OPTPROBES */
#define optimize_kprobe(p) do {} while (0)
-#define unoptimize_kprobe(p) do {} while (0)
+#define unoptimize_kprobe(p, f) do {} while (0)
#define kill_optimized_kprobe(p) do {} while (0)
#define prepare_optimized_kprobe(p) do {} while (0)
#define try_to_optimize_kprobe(p) do {} while (0)
#define __arm_kprobe(p) arch_arm_kprobe(p)
#define __disarm_kprobe(p) arch_disarm_kprobe(p)
+#define kprobe_disarmed(p) kprobe_disabled(p)
+#define wait_for_optimizer() do {} while (0)
+
+/* There should be no unused kprobes can be reused without optimization */
+static void reuse_unused_kprobe(struct kprobe *ap)
+{
+ printk(KERN_ERR "Error: There should be no unused kprobe here.\n");
+ BUG_ON(kprobe_unused(ap));
+}
static __kprobes void free_aggr_kprobe(struct kprobe *p)
{
+ arch_remove_kprobe(p);
kfree(p);
}
@@ -724,16 +892,6 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
mutex_unlock(&text_mutex);
}
-/* Disarm a kprobe with text_mutex */
-static void __kprobes disarm_kprobe(struct kprobe *kp)
-{
- get_online_cpus(); /* For avoiding text_mutex deadlock */
- mutex_lock(&text_mutex);
- __disarm_kprobe(kp);
- mutex_unlock(&text_mutex);
- put_online_cpus();
-}
-
/*
* Aggregate handlers for multiple kprobes support - these handlers
* take care of invoking the individual kprobe handlers on p->list
@@ -937,7 +1095,7 @@ static int __kprobes add_new_kprobe(struct kprobe *ap, struct kprobe *p)
BUG_ON(kprobe_gone(ap) || kprobe_gone(p));
if (p->break_handler || p->post_handler)
- unoptimize_kprobe(ap); /* Fall back to normal kprobe */
+ unoptimize_kprobe(ap, 1);/* Fall back to normal kprobe */
if (p->break_handler) {
if (ap->break_handler)
@@ -1000,7 +1158,9 @@ static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
if (!ap)
return -ENOMEM;
init_aggr_kprobe(ap, old_p);
- }
+ } else if (kprobe_unused(ap))
+ /* This probe is going to die. Rescue it */
+ reuse_unused_kprobe(ap);
if (kprobe_gone(ap)) {
/*
@@ -1034,23 +1194,6 @@ static int __kprobes register_aggr_kprobe(struct kprobe *old_p,
return add_new_kprobe(ap, p);
}
-/* Try to disable aggr_kprobe, and return 1 if succeeded.*/
-static int __kprobes try_to_disable_aggr_kprobe(struct kprobe *p)
-{
- struct kprobe *kp;
-
- list_for_each_entry_rcu(kp, &p->list, list) {
- if (!kprobe_disabled(kp))
- /*
- * There is an active probe on the list.
- * We can't disable aggr_kprobe.
- */
- return 0;
- }
- p->flags |= KPROBE_FLAG_DISABLED;
- return 1;
-}
-
static int __kprobes in_kprobes_functions(unsigned long addr)
{
struct kprobe_blackpoint *kb;
@@ -1224,6 +1367,45 @@ fail_with_jump_label:
}
EXPORT_SYMBOL_GPL(register_kprobe);
+/* Check if all probes on the aggrprobe are disabled */
+static int __kprobes aggr_kprobe_disabled(struct kprobe *ap)
+{
+ struct kprobe *kp;
+
+ list_for_each_entry_rcu(kp, &ap->list, list) {
+ if (!kprobe_disabled(kp))
+ /*
+ * There is an active probe on the list.
+ * We can't disable aggr_kprobe.
+ */
+ return 0;
+ }
+ return 1;
+}
+
+/* Disable one kprobe: Make sure called under kprobe_mutex is locked */
+static struct kprobe *__kprobes __disable_kprobe(struct kprobe *p)
+{
+ struct kprobe *orig_p;
+
+ /* Check whether specified probe is valid, and get real probe */
+ orig_p = __get_valid_kprobe(p);
+ if (unlikely(orig_p == NULL))
+ return NULL;
+
+ if (!kprobe_disabled(p)) {
+ if (p != orig_p)
+ p->flags |= KPROBE_FLAG_DISABLED;
+
+ if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
+ __disarm_kprobe(orig_p);
+ orig_p->flags |= KPROBE_FLAG_DISABLED;
+ }
+ }
+
+ return orig_p;
+}
+
/*
* Unregister a kprobe without a scheduler synchronization.
*/
@@ -1231,22 +1413,18 @@ static int __kprobes __unregister_kprobe_top(struct kprobe *p)
{
struct kprobe *old_p, *list_p;
- old_p = __get_valid_kprobe(p);
+ /* Disable kprobe. This will disarm it if needed. */
+ old_p = __disable_kprobe(p);
if (old_p == NULL)
return -EINVAL;
if (old_p == p ||
(kprobe_aggrprobe(old_p) &&
- list_is_singular(&old_p->list))) {
- /*
- * Only probe on the hash list. Disarm only if kprobes are
- * enabled and not gone - otherwise, the breakpoint would
- * already have been removed. We save on flushing icache.
- */
- if (!kprobes_all_disarmed && !kprobe_disabled(old_p))
- disarm_kprobe(old_p);
+ list_is_singular(&old_p->list) && kprobe_disarmed(old_p)))
+ /* Only disarmed probe on the hash list or on aggr_probe */
hlist_del_rcu(&old_p->hlist);
- } else {
+ else {
+ /* This probe is on an aggr_probe */
if (p->break_handler && !kprobe_gone(p))
old_p->break_handler = NULL;
if (p->post_handler && !kprobe_gone(p)) {
@@ -1258,16 +1436,12 @@ static int __kprobes __unregister_kprobe_top(struct kprobe *p)
}
noclean:
list_del_rcu(&p->list);
- if (!kprobe_disabled(old_p)) {
- try_to_disable_aggr_kprobe(old_p);
- if (!kprobes_all_disarmed) {
- if (kprobe_disabled(old_p))
- disarm_kprobe(old_p);
- else
- /* Try to optimize this probe again */
- optimize_kprobe(old_p);
- }
- }
+ if (!kprobe_disabled(old_p) && !kprobes_all_disarmed)
+ /*
+ * Try to optimize this probe again,
+ * because post hander may be changed.
+ */
+ optimize_kprobe(old_p);
}
return 0;
}
@@ -1276,13 +1450,13 @@ static void __kprobes __unregister_kprobe_bottom(struct kprobe *p)
{
struct kprobe *old_p;
+ /* If this probe was removed from hlist, remove kprobes insn buffer */
if (list_empty(&p->list))
arch_remove_kprobe(p);
else if (list_is_singular(&p->list)) {
/* "p" is the last child of an aggr_kprobe */
old_p = list_entry(p->list.next, struct kprobe, list);
list_del(&p->list);
- arch_remove_kprobe(old_p);
free_aggr_kprobe(old_p);
}
}
@@ -1324,6 +1498,7 @@ void __kprobes unregister_kprobes(struct kprobe **kps, int num)
mutex_unlock(&kprobe_mutex);
synchronize_sched();
+
for (i = 0; i < num; i++)
if (kps[i]->addr)
__unregister_kprobe_bottom(kps[i]);
@@ -1602,29 +1777,10 @@ static void __kprobes kill_kprobe(struct kprobe *p)
int __kprobes disable_kprobe(struct kprobe *kp)
{
int ret = 0;
- struct kprobe *p;
mutex_lock(&kprobe_mutex);
-
- /* Check whether specified probe is valid. */
- p = __get_valid_kprobe(kp);
- if (unlikely(p == NULL)) {
+ if (__disable_kprobe(kp) == NULL)
ret = -EINVAL;
- goto out;
- }
-
- /* If the probe is already disabled (or gone), just return */
- if (kprobe_disabled(kp))
- goto out;
-
- kp->flags |= KPROBE_FLAG_DISABLED;
- if (p != kp)
- /* When kp != p, p is always enabled. */
- try_to_disable_aggr_kprobe(p);
-
- if (!kprobes_all_disarmed && kprobe_disabled(p))
- disarm_kprobe(p);
-out:
mutex_unlock(&kprobe_mutex);
return ret;
}
@@ -1945,8 +2101,8 @@ static void __kprobes disarm_all_kprobes(void)
mutex_unlock(&text_mutex);
put_online_cpus();
mutex_unlock(&kprobe_mutex);
- /* Allow all currently running kprobes to complete */
- synchronize_sched();
+
+ wait_for_optimizer();
return;
already_disabled:
* Masami Hiramatsu <[email protected]> wrote:
> Unoptimization occurs when a probe is unregistered or disabled, and
> is heavy because it recovers instructions by using stop_machine().
> This patch delays unoptimization operations and unoptimize several
> probes at once by using text_poke_smp_batch().
> This can avoid unexpected system slowdown coming from stop_machine().
>
> Changes in v2:
> - Use dynamic allocated buffers and params.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ananth N Mavinakayanahalli <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: [email protected]
> Cc: Rusty Russell <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> ---
>
> arch/x86/include/asm/kprobes.h | 4
> arch/x86/kernel/kprobes.c | 45 ++++
> include/linux/kprobes.h | 2
> kernel/kprobes.c | 434 +++++++++++++++++++++++++++-------------
> 4 files changed, 344 insertions(+), 141 deletions(-)
Hm, this is a scary big patch with non-trivial effects - it should be broken up as
much as possible, into successive, bisectable steps. The end result looks mostly
fine - it's just this one huge step of a patch that is dangerous.
Thanks,
Ingo
(2010/11/26 16:26), Ingo Molnar wrote:
> * Masami Hiramatsu <[email protected]> wrote:
>
>> Unoptimization occurs when a probe is unregistered or disabled, and
>> is heavy because it recovers instructions by using stop_machine().
>> This patch delays unoptimization operations and unoptimize several
>> probes at once by using text_poke_smp_batch().
>> This can avoid unexpected system slowdown coming from stop_machine().
>>
>> Changes in v2:
>> - Use dynamic allocated buffers and params.
>>
>> Signed-off-by: Masami Hiramatsu <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: Mathieu Desnoyers <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Ananth N Mavinakayanahalli <[email protected]>
>> Cc: Steven Rostedt <[email protected]>
>> Cc: [email protected]
>> Cc: Rusty Russell <[email protected]>
>> Cc: Frederic Weisbecker <[email protected]>
>> ---
>>
>> arch/x86/include/asm/kprobes.h | 4
>> arch/x86/kernel/kprobes.c | 45 ++++
>> include/linux/kprobes.h | 2
>> kernel/kprobes.c | 434 +++++++++++++++++++++++++++-------------
>> 4 files changed, 344 insertions(+), 141 deletions(-)
>
> Hm, this is a scary big patch with non-trivial effects - it should be broken up as
> much as possible, into successive, bisectable steps. The end result looks mostly
> fine - it's just this one huge step of a patch that is dangerous.
Agreed, then I'll break it down to several patches.
Thank you for your comment! :)
>
> Thanks,
>
> Ingo
--
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]