Subject: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

Hi Steven,

Here, the series of patches which allows kprobes to use ftrace
for optimizing probing path if the probe on ftrace (mcount call).
The optimization is transparently done by kprobes.

Only if kprobe.break_handler is set, that probe can not be
optimized with ftrace (nor put on ftrace). The reason why this
limitation comes is that this break_handler may be used only
from jprobes which changes ip address (for fetching the function
arguments).
In this series, ftrace doesn't allow to change regs->ip,
since I don't want to make any non-essential trouble ;)

After deep consideration, I've decided to remove "real_addr"
method at this time, since it is hard to achieve complete
transparency with the combination of jprobe and aggregated
kprobes.
And also, at least on x86, ftrace-based kprobes is available.

Anyway, I can say jprobes is an out-dated probe because we
already has kprobe-tracer and perf-probe which allows us to
get function arguments directly from kprobes. :)

For using ftrace from kprobes, this series introduces two
new interface for ftrace;

- ftrace_ops.regs_func, which is a callback handler
invoked with pt_regs as third argument. For enable
this feature, ftrace_ops.flags must set
FTRACE_OPS_FL_SAVE_REGS bit.
- ftrace_set_filter_ip(), which allows to set new
address-based filter instead of glob pattern.

In this series, FTRACE_OPS_FL_SAVE_REGS feature is supported
only on x86 (x86-64 and i386). It may be possible to port
it on other architectures too.

Also, this makes all __kprobes functions "notrace", because
some of those functions are considered as to be called from
kprobes handler which is called from function tracer.
I think that is another discussion point. Perhaps, we need
to introduce another tag which means "don't put kprobe on
this function" instead of __kprobes and apply that.

Thank you,

---

Masami Hiramatsu (8):
kprobes/x86: ftrace based optiomization for x86
kprobes: introduce ftrace based optiomization
kprobes: Move locks into appropriate functions
kprobes: cleanup to separate probe-able check
ftrace: add ftrace_set_filter_ip() for address based filter
ftrace/x86: Support SAVE_REGS feature on i386
ftrace/x86-64: support SAVE_REGS feature on x86-64
ftrace: Add pt_regs acceptable trace callback

Steven Rostedt (1):
kprobes: Inverse taking of module_mutex with kprobe_mutex


arch/x86/include/asm/ftrace.h | 4 +
arch/x86/include/asm/kprobes.h | 1
arch/x86/kernel/entry_32.S | 64 +++++++++-
arch/x86/kernel/entry_64.S | 38 +++++-
arch/x86/kernel/kprobes.c | 48 ++++++++
include/linux/ftrace.h | 24 ++++
include/linux/kprobes.h | 35 +++++
kernel/kprobes.c | 256 +++++++++++++++++++++++++++++-----------
kernel/trace/ftrace.c | 103 ++++++++++++++--
9 files changed, 471 insertions(+), 102 deletions(-)

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]


Subject: [RFC PATCH -tip 1/9] ftrace: Add pt_regs acceptable trace callback

Add a new callback interface regs_func to ftrace_ops
which is an anonymous union to share the pointer
with func member. So, current ftrace user doesn't
need to change their callbacks.
This callback must be used with FTRACE_OPS_FL_SAVE_REGS.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

include/linux/ftrace.h | 21 ++++++++++++++++++++-
kernel/trace/ftrace.c | 44 +++++++++++++++++++++++++++++++-------------
2 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 55e6d63..11abe4e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -10,6 +10,8 @@
#include <linux/kallsyms.h>
#include <linux/linkage.h>
#include <linux/bitops.h>
+#include <linux/module.h>
+#include <linux/ptrace.h>
#include <linux/ktime.h>
#include <linux/sched.h>
#include <linux/types.h>
@@ -18,6 +20,14 @@

#include <asm/ftrace.h>

+/*
+ * If the arch supports saving the regs to the ftrace_ops
+ * then it should set this to 1.
+ */
+#ifndef ARCH_SUPPORTS_FTRACE_SAVE_REGS
+#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 0
+#endif
+
struct module;
struct ftrace_hash;

@@ -30,6 +40,8 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
loff_t *ppos);

typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
+typedef void (*ftrace_regs_func_t)(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs);

/*
* FTRACE_OPS_FL_* bits denote the state of ftrace_ops struct and are
@@ -45,16 +57,22 @@ typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip);
* could be controled by following calls:
* ftrace_function_local_enable
* ftrace_function_local_disable
+ * SAVE_REGS - set manualy by ftrace_ops user to denote the ftrace_ops
+ * requests to pass pt_regs to callback.
*/
enum {
FTRACE_OPS_FL_ENABLED = 1 << 0,
FTRACE_OPS_FL_GLOBAL = 1 << 1,
FTRACE_OPS_FL_DYNAMIC = 1 << 2,
FTRACE_OPS_FL_CONTROL = 1 << 3,
+ FTRACE_OPS_FL_SAVE_REGS = 1 << 4,
};

struct ftrace_ops {
- ftrace_func_t func;
+ union {
+ ftrace_func_t func;
+ ftrace_regs_func_t regs_func;
+ };
struct ftrace_ops *next;
unsigned long flags;
int __percpu *disabled;
@@ -164,6 +182,7 @@ static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
}

extern void ftrace_stub(unsigned long a0, unsigned long a1);
+#define ftrace_regs_stub (ftrace_regs_func_t)(ftrace_stub)

#else /* !CONFIG_FUNCTION_TRACER */
/*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a008663..357b15b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -101,7 +101,9 @@ static struct ftrace_ops global_ops;
static struct ftrace_ops control_ops;

static void
-ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
+ftrace_ops_list_regs_func(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs);
+#define ftrace_ops_list_func (ftrace_func_t)(ftrace_ops_list_regs_func)

/*
* Traverse the ftrace_global_list, invoking all entries. The reason that we
@@ -112,8 +114,9 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip);
*
* Silly Alpha and silly pointer-speculation compiler optimizations!
*/
-static void ftrace_global_list_func(unsigned long ip,
- unsigned long parent_ip)
+static void
+ftrace_global_list_regs_func(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs)
{
struct ftrace_ops *op;

@@ -123,19 +126,24 @@ static void ftrace_global_list_func(unsigned long ip,
trace_recursion_set(TRACE_GLOBAL_BIT);
op = rcu_dereference_raw(ftrace_global_list); /*see above*/
while (op != &ftrace_list_end) {
- op->func(ip, parent_ip);
+ op->regs_func(ip, parent_ip, regs);
op = rcu_dereference_raw(op->next); /*see above*/
};
trace_recursion_clear(TRACE_GLOBAL_BIT);
}
+#define ftrace_global_list_func (ftrace_func_t)(ftrace_global_list_regs_func)

-static void ftrace_pid_func(unsigned long ip, unsigned long parent_ip)
+static void ftrace_pid_regs_func(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs)
{
+ ftrace_regs_func_t func = (ftrace_regs_func_t)ftrace_pid_function;
+
if (!test_tsk_trace_trace(current))
return;

- ftrace_pid_function(ip, parent_ip);
+ func(ip, parent_ip, regs);
}
+#define ftrace_pid_func (ftrace_func_t)(ftrace_pid_regs_func)

static void set_ftrace_pid_function(ftrace_func_t func)
{
@@ -328,6 +336,10 @@ static int __register_ftrace_function(struct ftrace_ops *ops)
if (!core_kernel_data((unsigned long)ops))
ops->flags |= FTRACE_OPS_FL_DYNAMIC;

+ if ((ops->flags & FTRACE_OPS_FL_SAVE_REGS) &&
+ !ARCH_SUPPORTS_FTRACE_SAVE_REGS)
+ return -ENOSYS;
+
if (ops->flags & FTRACE_OPS_FL_GLOBAL) {
add_ftrace_list_ops(&ftrace_global_list, &global_ops, ops);
ops->flags |= FTRACE_OPS_FL_ENABLED;
@@ -1042,9 +1054,10 @@ static const struct ftrace_hash empty_hash = {
#define EMPTY_HASH ((struct ftrace_hash *)&empty_hash)

static struct ftrace_ops global_ops = {
- .func = ftrace_stub,
+ .regs_func = ftrace_regs_stub,
.notrace_hash = EMPTY_HASH,
.filter_hash = EMPTY_HASH,
+ .flags = FTRACE_OPS_FL_SAVE_REGS,
};

static DEFINE_MUTEX(ftrace_regex_lock);
@@ -3911,7 +3924,8 @@ void __init ftrace_init(void)
#else

static struct ftrace_ops global_ops = {
- .func = ftrace_stub,
+ .regs_func = ftrace_regs_stub,
+ .flags = FTRACE_OPS_FL_SAVE_REGS,
};

static int __init ftrace_nodyn_init(void)
@@ -3942,7 +3956,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip)
#endif /* CONFIG_DYNAMIC_FTRACE */

static void
-ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
+ftrace_ops_control_regs_func(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs)
{
struct ftrace_ops *op;

@@ -3959,7 +3974,7 @@ ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
while (op != &ftrace_list_end) {
if (!ftrace_function_local_disabled(op) &&
ftrace_ops_test(op, ip))
- op->func(ip, parent_ip);
+ op->regs_func(ip, parent_ip, regs);

op = rcu_dereference_raw(op->next);
};
@@ -3968,11 +3983,13 @@ ftrace_ops_control_func(unsigned long ip, unsigned long parent_ip)
}

static struct ftrace_ops control_ops = {
- .func = ftrace_ops_control_func,
+ .regs_func = ftrace_ops_control_regs_func,
+ .flags = FTRACE_OPS_FL_SAVE_REGS,
};

static void
-ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
+ftrace_ops_list_regs_func(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs)
{
struct ftrace_ops *op;

@@ -3988,7 +4005,8 @@ ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip)
op = rcu_dereference_raw(ftrace_ops_list);
while (op != &ftrace_list_end) {
if (ftrace_ops_test(op, ip))
- op->func(ip, parent_ip);
+ /* regs will be ignored if op->func is set */
+ op->regs_func(ip, parent_ip, regs);
op = rcu_dereference_raw(op->next);
};
preempt_enable_notrace();

Subject: [RFC PATCH -tip 2/9] ftrace/x86-64: support SAVE_REGS feature on x86-64

Add register saving/restoring sequence in ftrace_caller
on x86-64 for enabling SAVE_REGS feature.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

arch/x86/include/asm/ftrace.h | 4 ++++
arch/x86/kernel/entry_64.S | 38 +++++++++++++++++++++++++++++++++-----
2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 18d9005..ffb9564 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -32,6 +32,10 @@
#define MCOUNT_ADDR ((long)(mcount))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */

+#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
+#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
+#endif
+
#ifndef __ASSEMBLY__
extern void mcount(void);
extern int modifying_ftrace_code;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 320852d..6d0545b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -73,21 +73,44 @@ ENTRY(mcount)
retq
END(mcount)

+ .macro MCOUNT_SAVE_ALL
+ SAVE_ARGS (RSP-ORIG_RAX)
+ SAVE_REST
+ /* Adjust eflags, rsp and rip */
+ movq RSP(%rsp), %rdx
+ movq %rdx, EFLAGS(%rsp)
+ leaq SS+8(%rsp), %rdx
+ movq %rdx, RSP(%rsp)
+ movq SS(%rsp), %rdi
+ subq $MCOUNT_INSN_SIZE, %rdi
+ movq %rdi, RIP(%rsp)
+ .endm
+
+ .macro MCOUNT_RESTORE_ALL
+ /* store eflags into regs->rsp */
+ movq EFLAGS(%rsp), %rax
+ movq %rax, RSP(%rsp)
+ RESTORE_REST
+ RESTORE_ARGS 1, (RSP-ORIG_RAX)
+ .endm
+
ENTRY(ftrace_caller)
+ CFI_STARTPROC simple
+ pushfq_cfi
cmpl $0, function_trace_stop
- jne ftrace_stub
+ jne ftrace_exit

- MCOUNT_SAVE_FRAME
+ MCOUNT_SAVE_ALL

- movq 0x38(%rsp), %rdi
movq 8(%rbp), %rsi
- subq $MCOUNT_INSN_SIZE, %rdi
+ movq %rsp, %rdx

GLOBAL(ftrace_call)
call ftrace_stub

- MCOUNT_RESTORE_FRAME
+ MCOUNT_RESTORE_ALL

+ popfq_cfi
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
GLOBAL(ftrace_graph_call)
jmp ftrace_stub
@@ -95,6 +118,11 @@ GLOBAL(ftrace_graph_call)

GLOBAL(ftrace_stub)
retq
+
+ftrace_exit:
+ popfq_cfi
+ retq
+ CFI_ENDPROC
END(ftrace_caller)

#else /* ! CONFIG_DYNAMIC_FTRACE */

Subject: [RFC PATCH -tip 3/9] ftrace/x86: Support SAVE_REGS feature on i386

Add register saving/restoring sequence in ftrace_caller
on i386 for enabling SAVE_REGS feature.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

arch/x86/include/asm/ftrace.h | 2 +
arch/x86/kernel/entry_32.S | 64 +++++++++++++++++++++++++++++++++++------
2 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index ffb9564..e2f7269 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -32,7 +32,7 @@
#define MCOUNT_ADDR ((long)(mcount))
#define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */

-#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
+#if defined(CONFIG_DYNAMIC_FTRACE)
#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
#endif

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 01ccf9b..79c45a2 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1095,24 +1095,66 @@ ENTRY(mcount)
ret
END(mcount)

+ .macro FTRACE_SAVE_ALL
+ /* eflags is saved on cs */
+ subl $8, %esp /* skip ip and orig_ax */
+ pushl %gs
+ pushl %fs
+ pushl %es
+ pushl %ds
+ pushl %eax
+ pushl %ebp
+ pushl %edi
+ pushl %esi
+ pushl %edx
+ pushl %ecx
+ pushl %ebx
+ movl 14*4(%esp), %eax /* Load return address */
+ pushl %eax /* Save return address (+4) */
+ subl $MCOUNT_INSN_SIZE, %eax
+ movl %eax, 12*4+4(%esp) /* Store IP */
+ movl 13*4+4(%esp), %edx /* Load flags */
+ movl %edx, 14*4+4(%esp) /* Store flags */
+ movl $__KERNEL_CS, %edx
+ movl %edx, 13*4+4(%esp)
+ .endm
+
+ .macro FTRACE_RESTORE_ALL
+ movl 14*4+4(%esp), %eax /* Load flags */
+ movl %eax, 13*4+4(%esp) /* Restore flags */
+ popl %eax
+ movl %eax, 14*4(%esp) /* Restore return address */
+ popl %ebx
+ popl %ecx
+ popl %edx
+ popl %esi
+ popl %edi
+ popl %ebp
+ popl %eax
+ popl %ds
+ popl %es
+ popl %fs
+ popl %gs
+ addl $8, %esp
+ .endm
+
ENTRY(ftrace_caller)
+ pushf /* flags on regs->cs */
cmpl $0, function_trace_stop
- jne ftrace_stub
+ jne ftrace_exit
+
+ FTRACE_SAVE_ALL

- pushl %eax
- pushl %ecx
- pushl %edx
- movl 0xc(%esp), %eax
movl 0x4(%ebp), %edx
- subl $MCOUNT_INSN_SIZE, %eax
+ lea 4(%esp), %ecx

.globl ftrace_call
ftrace_call:
call ftrace_stub

- popl %edx
- popl %ecx
- popl %eax
+ FTRACE_RESTORE_ALL
+
+ popf
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
.globl ftrace_graph_call
ftrace_graph_call:
@@ -1122,6 +1164,10 @@ ftrace_graph_call:
.globl ftrace_stub
ftrace_stub:
ret
+
+ftrace_exit:
+ popf
+ ret
END(ftrace_caller)

#else /* ! CONFIG_DYNAMIC_FTRACE */

Subject: [RFC PATCH -tip 4/9] ftrace: add ftrace_set_filter_ip() for address based filter

Add a new filter update interface ftrace_set_filter_ip()
to set ftrace filter by ip address, not only glob pattern.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

include/linux/ftrace.h | 3 ++
kernel/trace/ftrace.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 11abe4e..f3c5487 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -263,6 +263,8 @@ struct dyn_ftrace {
};

int ftrace_force_update(void);
+int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
+ int remove, int reset);
int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
int len, int reset);
int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
@@ -432,6 +434,7 @@ static inline int ftrace_text_reserved(void *start, void *end)
*/
#define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
#define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
+#define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_free_filter(ops) do { } while (0)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 357b15b..3ce1219 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3175,8 +3175,27 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
}

static int
-ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
- int reset, int enable)
+ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
+{
+ struct ftrace_func_entry *entry;
+
+ if (!ftrace_location(ip))
+ return -EINVAL;
+
+ if (remove) {
+ entry = ftrace_lookup_ip(hash, ip);
+ if (!entry)
+ return -ENOENT;
+ free_hash_entry(hash, entry);
+ return 0;
+ }
+
+ return add_hash_entry(hash, ip);
+}
+
+static int
+ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
+ unsigned long ip, int remove, int reset, int enable)
{
struct ftrace_hash **orig_hash;
struct ftrace_hash *hash;
@@ -3205,6 +3224,11 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
ret = -EINVAL;
goto out_regex_unlock;
}
+ if (ip) {
+ ret = ftrace_match_addr(hash, ip, remove);
+ if (ret < 0)
+ goto out_regex_unlock;
+ }

mutex_lock(&ftrace_lock);
ret = ftrace_hash_move(ops, enable, orig_hash, hash);
@@ -3221,6 +3245,37 @@ ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
return ret;
}

+static int
+ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
+ int reset, int enable)
+{
+ return ftrace_set_hash(ops, 0, 0, ip, remove, reset, enable);
+}
+
+/**
+ * ftrace_set_filter_ip - set a function to filter on in ftrace by address
+ * @ops - the ops to set the filter with
+ * @ip - the address to add to or remove from the filter.
+ * @remove - non zero to remove the ip from the filter
+ * @reset - non zero to reset all filters before applying this filter.
+ *
+ * Filters denote which functions should be enabled when tracing is enabled
+ * If @ip is NULL, it failes to update filter.
+ */
+int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
+ int remove, int reset)
+{
+ return ftrace_set_addr(ops, ip, remove, reset, 1);
+}
+EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
+
+static int
+ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
+ int reset, int enable)
+{
+ return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable);
+}
+
/**
* ftrace_set_filter - set a function to filter on in ftrace
* @ops - the ops to set the filter with

Subject: [RFC PATCH -tip 5/9] kprobes: Inverse taking of module_mutex with kprobe_mutex

From: Steven Rostedt <[email protected]>

Currently module_mutex is taken before kprobe_mutex, but this
can cause issues when we have kprobes register ftrace, as the ftrace
mutex is taken before enabling a tracepoint, which currently takes
the module mutex.

If module_mutex is taken before kprobe_mutex, then we can not
have kprobes use the ftrace infrastructure.

There seems to be no reason that the kprobe_mutex can't be taken
before the module_mutex. Running lockdep shows that it is safe
among the kernels I've run.

Cc: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---

kernel/kprobes.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c62b854..7a8a122 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -561,9 +561,9 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
{
LIST_HEAD(free_list);

+ mutex_lock(&kprobe_mutex);
/* Lock modules while optimizing kprobes */
mutex_lock(&module_mutex);
- mutex_lock(&kprobe_mutex);

/*
* Step 1: Unoptimize kprobes and collect cleaned (unused and disarmed)
@@ -586,8 +586,8 @@ static __kprobes void kprobe_optimizer(struct work_struct *work)
/* Step 4: Free cleaned kprobes after quiesence period */
do_free_cleaned_kprobes(&free_list);

- mutex_unlock(&kprobe_mutex);
mutex_unlock(&module_mutex);
+ mutex_unlock(&kprobe_mutex);

/* Step 5: Kick optimizer again if needed */
if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list))

Subject: [RFC PATCH -tip 6/9] kprobes: cleanup to separate probe-able check

Separate probe-able address checking code from
register_kprobe().

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

kernel/kprobes.c | 82 ++++++++++++++++++++++++++++++------------------------
1 files changed, 45 insertions(+), 37 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 7a8a122..6137fe3 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1313,67 +1313,80 @@ static inline int check_kprobe_rereg(struct kprobe *p)
return ret;
}

-int __kprobes register_kprobe(struct kprobe *p)
+static __kprobes int check_kprobe_address_safe(struct kprobe *p,
+ struct module **probed_mod)
{
int ret = 0;
- struct kprobe *old_p;
- struct module *probed_mod;
- kprobe_opcode_t *addr;
-
- addr = kprobe_addr(p);
- if (IS_ERR(addr))
- return PTR_ERR(addr);
- p->addr = addr;
-
- ret = check_kprobe_rereg(p);
- if (ret)
- return ret;

jump_label_lock();
preempt_disable();
+
+ /* Ensure it is not in reserved area nor out of text */
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr) ||
ftrace_text_reserved(p->addr, p->addr) ||
jump_label_text_reserved(p->addr, p->addr)) {
ret = -EINVAL;
- goto cannot_probe;
+ goto out;
}

- /* User can pass only KPROBE_FLAG_DISABLED to register_kprobe */
- p->flags &= KPROBE_FLAG_DISABLED;
-
- /*
- * Check if are we probing a module.
- */
- probed_mod = __module_text_address((unsigned long) p->addr);
- if (probed_mod) {
- /* Return -ENOENT if fail. */
- ret = -ENOENT;
+ /* Check if are we probing a module */
+ *probed_mod = __module_text_address((unsigned long) p->addr);
+ if (*probed_mod) {
/*
* We must hold a refcount of the probed module while updating
* its code to prohibit unexpected unloading.
*/
- if (unlikely(!try_module_get(probed_mod)))
- goto cannot_probe;
+ if (unlikely(!try_module_get(*probed_mod))) {
+ ret = -ENOENT;
+ goto out;
+ }

/*
* If the module freed .init.text, we couldn't insert
* kprobes in there.
*/
- if (within_module_init((unsigned long)p->addr, probed_mod) &&
- probed_mod->state != MODULE_STATE_COMING) {
- module_put(probed_mod);
- goto cannot_probe;
+ if (within_module_init((unsigned long)p->addr, *probed_mod) &&
+ (*probed_mod)->state != MODULE_STATE_COMING) {
+ module_put(*probed_mod);
+ *probed_mod = NULL;
+ ret = -ENOENT;
}
- /* ret will be updated by following code */
}
+out:
preempt_enable();
jump_label_unlock();

+ return ret;
+}
+
+int __kprobes register_kprobe(struct kprobe *p)
+{
+ int ret;
+ struct kprobe *old_p;
+ struct module *probed_mod;
+ kprobe_opcode_t *addr;
+
+ /* Adjust probe address from symbol */
+ addr = kprobe_addr(p);
+ if (IS_ERR(addr))
+ return PTR_ERR(addr);
+ p->addr = addr;
+
+ ret = check_kprobe_rereg(p);
+ if (ret)
+ return ret;
+
+ /* User can pass only KPROBE_FLAG_DISABLED to register_kprobe */
+ p->flags &= KPROBE_FLAG_DISABLED;
p->nmissed = 0;
INIT_LIST_HEAD(&p->list);
- mutex_lock(&kprobe_mutex);

+ ret = check_kprobe_address_safe(p, &probed_mod);
+ if (ret)
+ return ret;
+
+ mutex_lock(&kprobe_mutex);
jump_label_lock(); /* needed to call jump_label_text_reserved() */

get_online_cpus(); /* For avoiding text_mutex deadlock. */
@@ -1410,11 +1423,6 @@ out:
module_put(probed_mod);

return ret;
-
-cannot_probe:
- preempt_enable();
- jump_label_unlock();
- return ret;
}
EXPORT_SYMBOL_GPL(register_kprobe);

Subject: [RFC PATCH -tip 7/9] kprobes: Move locks into appropriate functions

Break a big critical region into fine-grained pieces at
registering kprobe path. This helps us to solve circular
locking dependency when introducing ftrace-based kprobes.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

kernel/kprobes.c | 63 ++++++++++++++++++++++++++++++++++++------------------
1 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 6137fe3..9e47f44 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -759,20 +759,28 @@ static __kprobes void try_to_optimize_kprobe(struct kprobe *p)
struct kprobe *ap;
struct optimized_kprobe *op;

+ /* For preparing optimization, jump_label_text_reserved() is called */
+ jump_label_lock();
+ mutex_lock(&text_mutex);
+
ap = alloc_aggr_kprobe(p);
if (!ap)
- return;
+ goto out;

op = container_of(ap, struct optimized_kprobe, kp);
if (!arch_prepared_optinsn(&op->optinsn)) {
/* If failed to setup optimizing, fallback to kprobe */
arch_remove_optimized_kprobe(op);
kfree(op);
- return;
+ goto out;
}

init_aggr_kprobe(ap, p);
- optimize_kprobe(ap);
+ optimize_kprobe(ap); /* This just kicks optimizer thread */
+
+out:
+ mutex_unlock(&text_mutex);
+ jump_label_unlock();
}

#ifdef CONFIG_SYSCTL
@@ -1144,12 +1152,6 @@ static int __kprobes add_new_kprobe(struct kprobe *ap, struct kprobe *p)
if (p->post_handler && !ap->post_handler)
ap->post_handler = aggr_post_handler;

- if (kprobe_disabled(ap) && !kprobe_disabled(p)) {
- ap->flags &= ~KPROBE_FLAG_DISABLED;
- if (!kprobes_all_disarmed)
- /* Arm the breakpoint again. */
- __arm_kprobe(ap);
- }
return 0;
}

@@ -1189,11 +1191,22 @@ static int __kprobes register_aggr_kprobe(struct kprobe *orig_p,
int ret = 0;
struct kprobe *ap = orig_p;

+ /* For preparing optimization, jump_label_text_reserved() is called */
+ jump_label_lock();
+ /*
+ * Get online CPUs to avoid text_mutex deadlock.with stop machine,
+ * which is invoked by unoptimize_kprobe() in add_new_kprobe()
+ */
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+
if (!kprobe_aggrprobe(orig_p)) {
/* If orig_p is not an aggr_kprobe, create new aggr_kprobe. */
ap = alloc_aggr_kprobe(orig_p);
- if (!ap)
- return -ENOMEM;
+ if (!ap) {
+ ret = -ENOMEM;
+ goto out;
+ }
init_aggr_kprobe(ap, orig_p);
} else if (kprobe_unused(ap))
/* This probe is going to die. Rescue it */
@@ -1213,7 +1226,7 @@ static int __kprobes register_aggr_kprobe(struct kprobe *orig_p,
* free aggr_probe. It will be used next time, or
* freed by unregister_kprobe.
*/
- return ret;
+ goto out;

/* Prepare optimized instructions if possible. */
prepare_optimized_kprobe(ap);
@@ -1228,7 +1241,20 @@ static int __kprobes register_aggr_kprobe(struct kprobe *orig_p,

/* Copy ap's insn slot to p */
copy_kprobe(ap, p);
- return add_new_kprobe(ap, p);
+ ret = add_new_kprobe(ap, p);
+
+out:
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+ jump_label_unlock();
+
+ if (ret == 0 && kprobe_disabled(ap) && !kprobe_disabled(p)) {
+ ap->flags &= ~KPROBE_FLAG_DISABLED;
+ if (!kprobes_all_disarmed)
+ /* Arm the breakpoint again. */
+ arm_kprobe(ap);
+ }
+ return ret;
}

static int __kprobes in_kprobes_functions(unsigned long addr)
@@ -1387,10 +1413,6 @@ int __kprobes register_kprobe(struct kprobe *p)
return ret;

mutex_lock(&kprobe_mutex);
- jump_label_lock(); /* needed to call jump_label_text_reserved() */
-
- get_online_cpus(); /* For avoiding text_mutex deadlock. */
- mutex_lock(&text_mutex);

old_p = get_kprobe(p->addr);
if (old_p) {
@@ -1399,7 +1421,9 @@ int __kprobes register_kprobe(struct kprobe *p)
goto out;
}

+ mutex_lock(&text_mutex); /* Avoiding text modification */
ret = arch_prepare_kprobe(p);
+ mutex_unlock(&text_mutex);
if (ret)
goto out;

@@ -1408,15 +1432,12 @@ int __kprobes register_kprobe(struct kprobe *p)
&kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);

if (!kprobes_all_disarmed && !kprobe_disabled(p))
- __arm_kprobe(p);
+ arm_kprobe(p);

/* Try to optimize kprobe */
try_to_optimize_kprobe(p);

out:
- mutex_unlock(&text_mutex);
- put_online_cpus();
- jump_label_unlock();
mutex_unlock(&kprobe_mutex);

if (probed_mod)

Subject: [RFC PATCH -tip 8/9] kprobes: introduce ftrace based optiomization

Introduce function trace based kprobes optimization.

With using ftrace optimization, kprobes on the mcount calling
address, use ftrace's mcount call instead of breakpoint.
Farthermore, this optimization works with preemptive kernel
not like as current jump-based optimization. Of cource,
this feature is limited only if the probe on mcount call.

Only if kprobe.break_handler is set, that probe is not
optimized with ftrace (nor put on ftrace). The reason why this
limitation comes is that this break_handler may be used only
from jprobes which changes ip address (for fetching the function
arguments), but function tracer ignores modified ip address.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

include/linux/kprobes.h | 35 ++++++++++++++-
kernel/kprobes.c | 111 +++++++++++++++++++++++++++++++++++++++++------
2 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index b6e1f8c..757d605 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -38,6 +38,7 @@
#include <linux/spinlock.h>
#include <linux/rcupdate.h>
#include <linux/mutex.h>
+#include <linux/ftrace.h>

#ifdef CONFIG_KPROBES
#include <asm/kprobes.h>
@@ -48,8 +49,26 @@
#define KPROBE_REENTER 0x00000004
#define KPROBE_HIT_SSDONE 0x00000008

+/*
+ * If function tracer is enabled and the arch supports full
+ * passing of pt_regs to function tracing, then kprobes can
+ * optimize on top of function tracing.
+ */
+#if defined(CONFIG_FUNCTION_TRACER) && defined(ARCH_SUPPORTS_FTRACE_SAVE_REGS) \
+ && defined(ARCH_SUPPORTS_KPROBES_ON_FTRACE)
+# define KPROBES_CAN_USE_FTRACE
+#endif
+
/* Attach to insert probes on any functions which should be ignored*/
-#define __kprobes __attribute__((__section__(".kprobes.text")))
+#define ____kprobes __attribute__((__section__(".kprobes.text")))
+
+#ifdef KPROBES_CAN_USE_FTRACE
+/* If kprobes use ftrace, its handers must be notrace. */
+# define __kprobes ____kprobes notrace
+#else
+# define __kprobes ____kprobes
+#endif
+
#else /* CONFIG_KPROBES */
typedef int kprobe_opcode_t;
struct arch_specific_insn {
@@ -128,6 +147,7 @@ struct kprobe {
* NOTE:
* this flag is only for optimized_kprobe.
*/
+#define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */

/* Has this kprobe gone ? */
static inline int kprobe_gone(struct kprobe *p)
@@ -146,6 +166,13 @@ static inline int kprobe_optimized(struct kprobe *p)
{
return p->flags & KPROBE_FLAG_OPTIMIZED;
}
+
+/* Is this kprobe uses ftrace ? */
+static inline int kprobe_ftrace(struct kprobe *p)
+{
+ return p->flags & KPROBE_FLAG_FTRACE;
+}
+
/*
* Special probe type that uses setjmp-longjmp type tricks to resume
* execution at a specified entry with a matching prototype corresponding
@@ -295,6 +322,12 @@ extern int proc_kprobes_optimization_handler(struct ctl_table *table,
#endif

#endif /* CONFIG_OPTPROBES */
+#ifdef KPROBES_CAN_USE_FTRACE
+extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs);
+extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
+#endif
+

/* Get the kprobe at this addr (if any) - called with preemption disabled */
struct kprobe *get_kprobe(void *addr);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 9e47f44..3b1b21a 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -759,6 +759,10 @@ static __kprobes void try_to_optimize_kprobe(struct kprobe *p)
struct kprobe *ap;
struct optimized_kprobe *op;

+ /* Impossible to optimize ftrace-based kprobe */
+ if (kprobe_ftrace(p))
+ return;
+
/* For preparing optimization, jump_label_text_reserved() is called */
jump_label_lock();
mutex_lock(&text_mutex);
@@ -915,9 +919,65 @@ static __kprobes struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
}
#endif /* CONFIG_OPTPROBES */

+#ifdef KPROBES_CAN_USE_FTRACE
+static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
+ .regs_func = kprobe_ftrace_handler,
+ .flags = FTRACE_OPS_FL_SAVE_REGS,
+};
+static int kprobe_ftrace_enabled;
+
+static void __kprobes kprobe_ftrace_init(void)
+{
+ int ret;
+
+ ret = register_ftrace_function(&kprobe_ftrace_ops);
+ WARN(ret < 0, "Failed to init kprobe-ftrace (%d)\n", ret);
+
+ kprobe_ftrace_enabled = 1;
+}
+
+/* Must ensure p->addr is really on ftrace */
+static int __kprobes prepare_kprobe(struct kprobe *p)
+{
+ if (!kprobe_ftrace(p))
+ return arch_prepare_kprobe(p);
+
+ if (!kprobe_ftrace_enabled)
+ return -EINVAL;
+
+ return arch_prepare_kprobe_ftrace(p);
+}
+
+static void __kprobes arm_kprobe_ftrace(struct kprobe *p)
+{
+ int ret;
+
+ ret = ftrace_set_filter_ip(&kprobe_ftrace_ops,
+ (unsigned long)p->addr, 0, 0);
+ WARN(ret < 0, "Failed to arm kprobe-ftrace at %p (%d)\n", p->addr, ret);
+}
+
+static void __kprobes disarm_kprobe_ftrace(struct kprobe *p)
+{
+ int ret;
+
+ ret = ftrace_set_filter_ip(&kprobe_ftrace_ops,
+ (unsigned long)p->addr, 1, 0);
+ WARN(ret < 0, "Failed to disarm kprobe-ftrace at %p (%d)\n", p->addr, ret);
+}
+#else /* !KPROBES_CAN_USE_FTRACE */
+#define prepare_kprobe(p) arch_prepare_kprobe(p)
+#define arm_kprobe_ftrace(p) do {} while (0)
+#define disarm_kprobe_ftrace(p) do {} while (0)
+#endif
+
/* Arm a kprobe with text_mutex */
static void __kprobes arm_kprobe(struct kprobe *kp)
{
+ if (unlikely(kprobe_ftrace(kp))) {
+ arm_kprobe_ftrace(kp);
+ return;
+ }
/*
* Here, since __arm_kprobe() doesn't use stop_machine(),
* this doesn't cause deadlock on text_mutex. So, we don't
@@ -929,11 +989,15 @@ static void __kprobes arm_kprobe(struct kprobe *kp)
}

/* Disarm a kprobe with text_mutex */
-static void __kprobes disarm_kprobe(struct kprobe *kp)
+static void __kprobes disarm_kprobe(struct kprobe *kp, bool reopt)
{
+ if (unlikely(kprobe_ftrace(kp))) {
+ disarm_kprobe_ftrace(kp);
+ return;
+ }
/* Ditto */
mutex_lock(&text_mutex);
- __disarm_kprobe(kp, true);
+ __disarm_kprobe(kp, reopt);
mutex_unlock(&text_mutex);
}

@@ -1343,6 +1407,26 @@ static __kprobes int check_kprobe_address_safe(struct kprobe *p,
struct module **probed_mod)
{
int ret = 0;
+ unsigned long ftrace_addr;
+
+ /*
+ * If the address is located on a ftrace nop, set the
+ * breakpoint to the following instruction.
+ */
+ ftrace_addr = ftrace_location((unsigned long)p->addr);
+ if (ftrace_addr) {
+#ifdef KPROBES_CAN_USE_FTRACE
+ /* Given address is not on the instruction boundary */
+ if ((unsigned long)p->addr != ftrace_addr)
+ return -EILSEQ;
+ /* break_handler (jprobe) can not work with ftrace */
+ if (p->break_handler)
+ return -EINVAL;
+ p->flags |= KPROBE_FLAG_FTRACE;
+#else /* !KPROBES_CAN_USE_FTRACE */
+ return -EINVAL;
+#endif
+ }

jump_label_lock();
preempt_disable();
@@ -1350,7 +1434,6 @@ static __kprobes int check_kprobe_address_safe(struct kprobe *p,
/* Ensure it is not in reserved area nor out of text */
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr) ||
- ftrace_text_reserved(p->addr, p->addr) ||
jump_label_text_reserved(p->addr, p->addr)) {
ret = -EINVAL;
goto out;
@@ -1422,7 +1505,7 @@ int __kprobes register_kprobe(struct kprobe *p)
}

mutex_lock(&text_mutex); /* Avoiding text modification */
- ret = arch_prepare_kprobe(p);
+ ret = prepare_kprobe(p);
mutex_unlock(&text_mutex);
if (ret)
goto out;
@@ -1480,7 +1563,7 @@ static struct kprobe *__kprobes __disable_kprobe(struct kprobe *p)

/* Try to disarm and disable this/parent probe */
if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
- disarm_kprobe(orig_p);
+ disarm_kprobe(orig_p, true);
orig_p->flags |= KPROBE_FLAG_DISABLED;
}
}
@@ -2050,6 +2133,11 @@ static int __init init_kprobes(void)

kprobes_initialized = (err == 0);

+#ifdef KPROBES_CAN_USE_FTRACE
+ /* This is an optional feature */
+ kprobe_ftrace_init();
+#endif
+
if (!err)
init_test_probes();
return err;
@@ -2078,10 +2166,11 @@ static void __kprobes report_probe(struct seq_file *pi, struct kprobe *p,

if (!pp)
pp = p;
- seq_printf(pi, "%s%s%s\n",
+ seq_printf(pi, "%s%s%s%s\n",
(kprobe_gone(p) ? "[GONE]" : ""),
((kprobe_disabled(p) && !kprobe_gone(p)) ? "[DISABLED]" : ""),
- (kprobe_optimized(pp) ? "[OPTIMIZED]" : ""));
+ (kprobe_optimized(pp) ? "[OPTIMIZED]" : ""),
+ (kprobe_ftrace(pp) ? "[FTRACE]" : ""));
}

static void __kprobes *kprobe_seq_start(struct seq_file *f, loff_t *pos)
@@ -2160,14 +2249,12 @@ static void __kprobes arm_all_kprobes(void)
goto already_enabled;

/* Arming kprobes doesn't optimize kprobe itself */
- mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist)
if (!kprobe_disabled(p))
- __arm_kprobe(p);
+ arm_kprobe(p);
}
- mutex_unlock(&text_mutex);

kprobes_all_disarmed = false;
printk(KERN_INFO "Kprobes globally enabled\n");
@@ -2195,15 +2282,13 @@ static void __kprobes disarm_all_kprobes(void)
kprobes_all_disarmed = true;
printk(KERN_INFO "Kprobes globally disabled\n");

- mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
if (!arch_trampoline_kprobe(p) && !kprobe_disabled(p))
- __disarm_kprobe(p, false);
+ disarm_kprobe(p, false);
}
}
- mutex_unlock(&text_mutex);
mutex_unlock(&kprobe_mutex);

/* Wait for disarming all kprobes by optimizer */

Subject: [RFC PATCH -tip 9/9] kprobes/x86: ftrace based optiomization for x86

Add function tracer based kprobe optimization support
handlers on x86. This allows kprobes to use function
tracer for probing on mcount call.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
---

arch/x86/include/asm/kprobes.h | 1 +
arch/x86/kernel/kprobes.c | 48 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 5478825..d3ddd17 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -27,6 +27,7 @@
#include <asm/insn.h>

#define __ARCH_WANT_KPROBES_INSN_SLOT
+#define ARCH_SUPPORTS_KPROBES_ON_FTRACE

struct pt_regs;
struct kprobe;
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index e2f751e..4c46c55 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -1052,6 +1052,54 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
return 0;
}

+#ifdef KPROBES_CAN_USE_FTRACE
+/* Ftrace callback handler for kprobes */
+void __kprobes kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
+ struct pt_regs *regs)
+{
+ struct kprobe *p;
+ struct kprobe_ctlblk *kcb;
+ unsigned long flags;
+
+ /* Disable irq for emulating a breakpoint and avoiding preempt */
+ local_irq_save(flags);
+
+ p = get_kprobe((kprobe_opcode_t *)ip);
+ if (unlikely(!p) || kprobe_disabled(p))
+ goto end;
+
+ kcb = get_kprobe_ctlblk();
+ if (kprobe_running()) {
+ kprobes_inc_nmissed_count(p);
+ } else {
+ regs->ip += sizeof(kprobe_opcode_t);
+
+ __this_cpu_write(current_kprobe, p);
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ if (p->pre_handler)
+ p->pre_handler(p, regs);
+
+ if (unlikely(p->post_handler)) {
+ /* Emulate singlestep as if there is a 5byte nop */
+ regs->ip = ip + MCOUNT_INSN_SIZE;
+ kcb->kprobe_status = KPROBE_HIT_SSDONE;
+ p->post_handler(p, regs, 0);
+ }
+ __this_cpu_write(current_kprobe, NULL);
+ regs->ip = ip; /* Recover for next callback */
+ }
+end:
+ local_irq_restore(flags);
+}
+
+int __kprobes arch_prepare_kprobe_ftrace(struct kprobe *p)
+{
+ p->ainsn.insn = NULL;
+ p->ainsn.boostable = -1;
+ return 0;
+}
+#endif
+
int __init arch_init_kprobes(void)
{
return arch_init_optprobes();

2012-05-29 22:45:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

On Tue, 2012-05-29 at 21:48 +0900, Masami Hiramatsu wrote:

> Also, this makes all __kprobes functions "notrace", because
> some of those functions are considered as to be called from
> kprobes handler which is called from function tracer.
> I think that is another discussion point. Perhaps, we need
> to introduce another tag which means "don't put kprobe on
> this function" instead of __kprobes and apply that.

Actually, instead, we can force kprobes to have all "__kprobes"
functions added to its 'notrace' ftrace_ops. This will just keep kprobes
from function tracing these, as I find myself tracing functions marked
by kprobes quite a bit.

-- Steve

2012-05-29 23:05:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 2/9] ftrace/x86-64: support SAVE_REGS feature on x86-64

On Tue, 2012-05-29 at 21:49 +0900, Masami Hiramatsu wrote:
> Add register saving/restoring sequence in ftrace_caller
> on x86-64 for enabling SAVE_REGS feature.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> ---
>
> arch/x86/include/asm/ftrace.h | 4 ++++
> arch/x86/kernel/entry_64.S | 38 +++++++++++++++++++++++++++++++++-----
> 2 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index 18d9005..ffb9564 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -32,6 +32,10 @@
> #define MCOUNT_ADDR ((long)(mcount))
> #define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */
>
> +#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_X86_64)
> +#define ARCH_SUPPORTS_FTRACE_SAVE_REGS 1
> +#endif
> +
> #ifndef __ASSEMBLY__
> extern void mcount(void);
> extern int modifying_ftrace_code;
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 320852d..6d0545b 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -73,21 +73,44 @@ ENTRY(mcount)
> retq
> END(mcount)
>
> + .macro MCOUNT_SAVE_ALL
> + SAVE_ARGS (RSP-ORIG_RAX)
> + SAVE_REST
> + /* Adjust eflags, rsp and rip */
> + movq RSP(%rsp), %rdx
> + movq %rdx, EFLAGS(%rsp)
> + leaq SS+8(%rsp), %rdx
> + movq %rdx, RSP(%rsp)
> + movq SS(%rsp), %rdi
> + subq $MCOUNT_INSN_SIZE, %rdi
> + movq %rdi, RIP(%rsp)
> + .endm
> +
> + .macro MCOUNT_RESTORE_ALL
> + /* store eflags into regs->rsp */
> + movq EFLAGS(%rsp), %rax
> + movq %rax, RSP(%rsp)
> + RESTORE_REST
> + RESTORE_ARGS 1, (RSP-ORIG_RAX)
> + .endm
> +
> ENTRY(ftrace_caller)
> + CFI_STARTPROC simple
> + pushfq_cfi
> cmpl $0, function_trace_stop
> - jne ftrace_stub
> + jne ftrace_exit
>
> - MCOUNT_SAVE_FRAME
> + MCOUNT_SAVE_ALL

No! Please do not make ftrace save all regs for everyone. This is
critical. This will slow down function tracing even more. We do not need
to save all regs for every function call, unless it is asked for.

It should only save all regs if the ftrace_ops that is registered asked
for it. I had code that actually does that. I can merge your patches
with them if you would like.

Thanks,

-- Steve

>
> - movq 0x38(%rsp), %rdi
> movq 8(%rbp), %rsi
> - subq $MCOUNT_INSN_SIZE, %rdi
> + movq %rsp, %rdx
>
> GLOBAL(ftrace_call)
> call ftrace_stub
>
> - MCOUNT_RESTORE_FRAME
> + MCOUNT_RESTORE_ALL
>
> + popfq_cfi
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> GLOBAL(ftrace_graph_call)
> jmp ftrace_stub
> @@ -95,6 +118,11 @@ GLOBAL(ftrace_graph_call)
>
> GLOBAL(ftrace_stub)
> retq
> +
> +ftrace_exit:
> + popfq_cfi
> + retq
> + CFI_ENDPROC
> END(ftrace_caller)
>
> #else /* ! CONFIG_DYNAMIC_FTRACE */

Subject: Re: [RFC PATCH -tip 2/9] ftrace/x86-64: support SAVE_REGS feature on x86-64

(2012/05/30 8:05), Steven Rostedt wrote:
> No! Please do not make ftrace save all regs for everyone. This is
> critical. This will slow down function tracing even more. We do not need
> to save all regs for every function call, unless it is asked for.

Indeed, I have just removed your save_regs dynamic selector code,
because I'd like to make this series as simple as possible
for review.

> It should only save all regs if the ftrace_ops that is registered asked
> for it. I had code that actually does that. I can merge your patches
> with them if you would like.

Yes, I'd like :)

BTW, that code which dynamically selects trace_caller on your
ftrace-v2 tree, seems to be an uncompleted work. Would you
already have done it?

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

(2012/05/30 7:45), Steven Rostedt wrote:
> On Tue, 2012-05-29 at 21:48 +0900, Masami Hiramatsu wrote:
>
>> Also, this makes all __kprobes functions "notrace", because
>> some of those functions are considered as to be called from
>> kprobes handler which is called from function tracer.
>> I think that is another discussion point. Perhaps, we need
>> to introduce another tag which means "don't put kprobe on
>> this function" instead of __kprobes and apply that.
>
> Actually, instead, we can force kprobes to have all "__kprobes"
> functions added to its 'notrace' ftrace_ops. This will just keep kprobes
> from function tracing these, as I find myself tracing functions marked
> by kprobes quite a bit.

Hmm, I'm not so sure how the notrace and filter works.
What happens if I set a foo function-entry on filter
and keep notrace empty?
- only foo's nop is replaced with call?
- or all functions including foo is traced?

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC PATCH -tip 8/9] kprobes: introduce ftrace based optiomization

On Tue, May 29, 2012 at 09:49:45PM +0900, Masami Hiramatsu wrote:
> Introduce function trace based kprobes optimization.
>
> With using ftrace optimization, kprobes on the mcount calling
> address, use ftrace's mcount call instead of breakpoint.
> Farthermore, this optimization works with preemptive kernel
> not like as current jump-based optimization. Of cource,
> this feature is limited only if the probe on mcount call.

The above paragraph doesn't parse correctly for me. Do you mean to say
if the probe is on the mcount calling address, use the jump based
approach instead of the breakpoint one? Could you please rephrase?

...

> +static void __kprobes kprobe_ftrace_init(void)
> +{
> + int ret;
> +
> + ret = register_ftrace_function(&kprobe_ftrace_ops);
> + WARN(ret < 0, "Failed to init kprobe-ftrace (%d)\n", ret);
> +
> + kprobe_ftrace_enabled = 1;

Hmm.. is this right? kprobe_ftrace_enabled is 1 even if the init failed.

Ananth

Subject: Re: [RFC PATCH -tip 8/9] kprobes: introduce ftrace based optiomization

(2012/05/30 16:22), Ananth N Mavinakayanahalli wrote:
> On Tue, May 29, 2012 at 09:49:45PM +0900, Masami Hiramatsu wrote:
>> Introduce function trace based kprobes optimization.
>>
>> With using ftrace optimization, kprobes on the mcount calling
>> address, use ftrace's mcount call instead of breakpoint.
>> Farthermore, this optimization works with preemptive kernel
>> not like as current jump-based optimization. Of cource,
>> this feature is limited only if the probe on mcount call.
>
> The above paragraph doesn't parse correctly for me. Do you mean to say
> if the probe is on the mcount calling address, use the jump based
> approach instead of the breakpoint one? Could you please rephrase?

Right, but not use current jump-base one, but use function
tracer handler directly.

So, ftrace-based optimization will be done on the kprobe at
the mcount calling address, which has been replaced with a
5 byte NOP at the build-time.
The ftrace-based optimization uses function-tracer handler
(kernel/trace/ftrace.c) instead of int3 breakpoint trapping.

The probing behavior is like below

1. hit mcount calling address
2. call ftrace_caller
-> 3. save all registers
4. call ftrace's handler (kprobe_ftrace_handler)
-> 5. set up current kprobe
6. call kprobe handler
<- 7. return
8. restore registers
<- 9. return
10. continue to run

>> +static void __kprobes kprobe_ftrace_init(void)
>> +{
>> + int ret;
>> +
>> + ret = register_ftrace_function(&kprobe_ftrace_ops);
>> + WARN(ret < 0, "Failed to init kprobe-ftrace (%d)\n", ret);
>> +
>> + kprobe_ftrace_enabled = 1;
>
> Hmm.. is this right? kprobe_ftrace_enabled is 1 even if the init failed.

Oops, that should be a bug! thanks!


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-05-30 11:34:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 2/9] ftrace/x86-64: support SAVE_REGS feature on x86-64

On Wed, 2012-05-30 at 15:39 +0900, Masami Hiramatsu wrote:

> BTW, that code which dynamically selects trace_caller on your
> ftrace-v2 tree, seems to be an uncompleted work. Would you
> already have done it?

I'll have to see where I left off. I'm a bit ADHD when it comes to some
of my projects. I work on them for a bit, then the next day I'll go work
on something else. Eventually all my projects get done (well, I like to
think they do ;-)

I'm currently working on a nasty bug that's in mainline when you enable
both LOCKDEP and FUNCTION_TRACING. I've been at it since Thursday (even
worked a bit over the US holiday here). I'm hoping I have a working
solution for it today. But I can't work on anything else till this is
complete.

After I get this bug fixed, I can take a look at the multi select code I
have.

Thanks,

-- Steve

2012-05-30 11:39:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

On Wed, 2012-05-30 at 15:59 +0900, Masami Hiramatsu wrote:

> Hmm, I'm not so sure how the notrace and filter works.
> What happens if I set a foo function-entry on filter
> and keep notrace empty?
> - only foo's nop is replaced with call?
> - or all functions including foo is traced?

>From Documentation/trace/ftrace.txt:

"If a function exists in both set_ftrace_filter
and set_ftrace_notrace, the function will _not_ be traced."

The filters work exactly the same. If notrace always take precedence
over filter. If you have foo and bar in filter, and put foo in notrace,
then only bar is traced.

"filter" means "limit tracing only to these functions"
"notrace" means "do not trace this function"

Think of 'filter' as a way of making the 'available_filter_functions'
smaller. It filters the list. But 'notrace' is just like adding a
'notrace' tag. It stops it from being traced regardless.

-- Steve


Subject: Re: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

(2012/05/30 20:39), Steven Rostedt wrote:
> On Wed, 2012-05-30 at 15:59 +0900, Masami Hiramatsu wrote:
>
>> Hmm, I'm not so sure how the notrace and filter works.
>> What happens if I set a foo function-entry on filter
>> and keep notrace empty?
>> - only foo's nop is replaced with call?
>> - or all functions including foo is traced?
>
> From Documentation/trace/ftrace.txt:
>
> "If a function exists in both set_ftrace_filter
> and set_ftrace_notrace, the function will _not_ be traced."
>
> The filters work exactly the same. If notrace always take precedence
> over filter. If you have foo and bar in filter, and put foo in notrace,
> then only bar is traced.
>
> "filter" means "limit tracing only to these functions"
> "notrace" means "do not trace this function"
>
> Think of 'filter' as a way of making the 'available_filter_functions'
> smaller. It filters the list. But 'notrace' is just like adding a
> 'notrace' tag. It stops it from being traced regardless.

OK, that's same as what I expected. In that case,
all __kprobes functions are already filtered out
by kprobes itself. So we don't need to set that anymore.

Hmm, CFLAGS_REMOVE_kprobes.o can also keep kprobes from
function tracer. So I'd like to try to use that instead
of including notrace into __kprobes.
However, in that case, kprobe users must remove -pg from
their kernel modules too, and take care that they must
call only notrace kernel APIs...

Perhaps, we'd better introduce new kprobe flag which allow
kprobe to accept new probe on ftrace, so that user can
explicitly understand what he will do.

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-05-31 15:15:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

On Fri, 2012-06-01 at 00:01 +0900, Masami Hiramatsu wrote:

> OK, that's same as what I expected. In that case,
> all __kprobes functions are already filtered out
> by kprobes itself. So we don't need to set that anymore.
>
> Hmm, CFLAGS_REMOVE_kprobes.o can also keep kprobes from
> function tracer. So I'd like to try to use that instead
> of including notrace into __kprobes.
> However, in that case, kprobe users must remove -pg from
> their kernel modules too, and take care that they must
> call only notrace kernel APIs...
>
> Perhaps, we'd better introduce new kprobe flag which allow
> kprobe to accept new probe on ftrace, so that user can
> explicitly understand what he will do.

Please do not make kprobe functions not allow function tracing! I *want*
to trace these functions! For example, I trace functions in NMIs all the
time, and I know these are prohibited by kprobes.

Why can't we function trace this? If kprobes does not trace functions
marked with kprobes already, then it should not have any issue. Kprobes
will only use the function tracer for what its allowed to use.

Do not remove -pg from anything to satisfy kprobes. It shouldn't need
it.

-- Steve



Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

(2012/06/01 0:15), Steven Rostedt wrote:
> On Fri, 2012-06-01 at 00:01 +0900, Masami Hiramatsu wrote:
>
>> OK, that's same as what I expected. In that case,
>> all __kprobes functions are already filtered out
>> by kprobes itself. So we don't need to set that anymore.
>>
>> Hmm, CFLAGS_REMOVE_kprobes.o can also keep kprobes from
>> function tracer. So I'd like to try to use that instead
>> of including notrace into __kprobes.
>> However, in that case, kprobe users must remove -pg from
>> their kernel modules too, and take care that they must
>> call only notrace kernel APIs...
>>
>> Perhaps, we'd better introduce new kprobe flag which allow
>> kprobe to accept new probe on ftrace, so that user can
>> explicitly understand what he will do.
>
> Please do not make kprobe functions not allow function tracing! I *want*
> to trace these functions! For example, I trace functions in NMIs all the
> time, and I know these are prohibited by kprobes.
>
> Why can't we function trace this? If kprobes does not trace functions
> marked with kprobes already, then it should not have any issue. Kprobes
> will only use the function tracer for what its allowed to use.

Because when I removed notrace from the __kprobes, the kernel caused
triple fault and didn't boot, even kprobes was not used.
I guess that it is because some recursive call of function tracer
has happened. So, I've added notrace to __kprobes (but it was too
widely applied).

> Do not remove -pg from anything to satisfy kprobes. It shouldn't need
> it.

But some kprobes functions will/must be called from kprobes handlers.
Those should be marked as notrace, shouldn't it?

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

(2012/06/01 0:15), Steven Rostedt wrote:
> On Fri, 2012-06-01 at 00:01 +0900, Masami Hiramatsu wrote:
>
>> OK, that's same as what I expected. In that case,
>> all __kprobes functions are already filtered out
>> by kprobes itself. So we don't need to set that anymore.
>>
>> Hmm, CFLAGS_REMOVE_kprobes.o can also keep kprobes from
>> function tracer. So I'd like to try to use that instead
>> of including notrace into __kprobes.
>> However, in that case, kprobe users must remove -pg from
>> their kernel modules too, and take care that they must
>> call only notrace kernel APIs...
>>
>> Perhaps, we'd better introduce new kprobe flag which allow
>> kprobe to accept new probe on ftrace, so that user can
>> explicitly understand what he will do.
>
> Please do not make kprobe functions not allow function tracing! I *want*
> to trace these functions! For example, I trace functions in NMIs all the
> time, and I know these are prohibited by kprobes.
>
> Why can't we function trace this? If kprobes does not trace functions
> marked with kprobes already, then it should not have any issue. Kprobes
> will only use the function tracer for what its allowed to use.

OK, so I've introduced new noprobe tag and replaced __kprobes
with it. And now __kprobes tag which is a combination of noprobe
and notrace, means that the function is not probed and it can be
called from kprobe handler. (thus user must use this with their
handlers and functions which will be used from the handlers)
And also most of __kprobes tags are replaced by noprobe only.
This means that you can trace those by function tracer :)

BTW, currently kprobes allows user cases pagefault in their
handler (kprobe.fault_handler will handle it). I guess that
can cause some problem with ftrace, isn't it? If so, I need
to deny a kprobe using ftrace if it has fault_handler.

As far as I know, systemtap relays on it.
(of course, kprobe-tracer doesn't use it)

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-06-01 14:20:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

On Fri, 2012-06-01 at 22:36 +0900, Masami Hiramatsu wrote:

> OK, so I've introduced new noprobe tag and replaced __kprobes
> with it. And now __kprobes tag which is a combination of noprobe
> and notrace, means that the function is not probed and it can be
> called from kprobe handler. (thus user must use this with their
> handlers and functions which will be used from the handlers)
> And also most of __kprobes tags are replaced by noprobe only.

You still haven't answered my question. Why can't function tracer still
trace these? If kprobes does not allow it to be probed, it should not
interfere with your code. But normal function tracing should still allow
these.

I still do not understand why you need to add 'notrace' at all.

> This means that you can trace those by function tracer :)
>
> BTW, currently kprobes allows user cases pagefault in their
> handler (kprobe.fault_handler will handle it). I guess that
> can cause some problem with ftrace, isn't it? If so, I need
> to deny a kprobe using ftrace if it has fault_handler.

As long as there's recursion protection you are fine. In fact, I may add
recursion protection within the assembly itself, that will make all
function tracing safe. (does not solve the breakpoint bug from the other
thread, but will solve most other things). In fact, this may allow us to
remove notraces that were added because of recursion issues.

-- Steve

2012-06-02 02:08:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 1/9] ftrace: Add pt_regs acceptable trace callback

On Tue, 2012-05-29 at 21:48 +0900, Masami Hiramatsu wrote:

> struct ftrace_ops {
> - ftrace_func_t func;
> + union {
> + ftrace_func_t func;
> + ftrace_regs_func_t regs_func;
> + };
> struct ftrace_ops *next;
> unsigned long flags;
> int __percpu *disabled;
> @@ -164,6 +182,7 @@ static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
> }

[..]

>
> static struct ftrace_ops global_ops = {
> - .func = ftrace_stub,
> + .regs_func = ftrace_regs_stub,
> .notrace_hash = EMPTY_HASH,
> .filter_hash = EMPTY_HASH,
> + .flags = FTRACE_OPS_FL_SAVE_REGS,
> };
>
> static DEFINE_MUTEX(ftrace_regex_lock);
> @@ -3911,7 +3924,8 @@ void __init ftrace_init(void)
> #else
>
> static struct ftrace_ops global_ops = {
> - .func = ftrace_stub,
> + .regs_func = ftrace_regs_stub,
> + .flags = FTRACE_OPS_FL_SAVE_REGS,
> };
>

Ug, this wont compile with some versions of gcc :-(

The one I stumbled on is gcc 4.5.1 (which I test builds against 4.5.1
and 4.6.0). Then I saw this BZ:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676

This can't be a union :-( Then we can not initialize it.

I may go with my old approach and just change all callers to have a regs
parameter.

-- Steve

Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

(2012/06/01 23:20), Steven Rostedt wrote:
> On Fri, 2012-06-01 at 22:36 +0900, Masami Hiramatsu wrote:
>
>> OK, so I've introduced new noprobe tag and replaced __kprobes
>> with it. And now __kprobes tag which is a combination of noprobe
>> and notrace, means that the function is not probed and it can be
>> called from kprobe handler. (thus user must use this with their
>> handlers and functions which will be used from the handlers)
>> And also most of __kprobes tags are replaced by noprobe only.
>
> You still haven't answered my question. Why can't function tracer still
> trace these? If kprobes does not allow it to be probed, it should not
> interfere with your code. But normal function tracing should still allow
> these.

Because those are called from ftrace-based kprobe, which means
it is directly invoked from kprobe_ftrace_handler. I think
that should be handled as a part of ftrace handler.
Currently, I just added notrace on below two kind of functions

- handler functions which can be called intermediately from ftrace
- get_kprobe, set_kprobe_instance, etc. internal utility functions
which is called directly from kprobe ftrace handler.


> I still do not understand why you need to add 'notrace' at all.

Because I'd like to solve a recursive call problem.

I saw a problem which I hit some odd function tracer behavior.
When I removed notrace from get_kprobe(), which is an essential
internal function called directly from kprobe_ftrace_handler,
I hit a kernel crash caused by recursive call right after I
registered kprobe_ftrace_handler to ftrace. At that time,
ftrace_ops.filter was empty so I thought there is no function
traced, but the kprobe_ftrace_handler was called from somewhere.
So I saw it hit a recursive loop of ftrace_call ->
kprobe_ftrace_handler -> get_kprobe -> ftrace_call ...

I think if I just register kprobe's ftrace_ops without start
tracing, I think we can just do tracing without "notrace".

>> This means that you can trace those by function tracer :)
>>
>> BTW, currently kprobes allows user cases pagefault in their
>> handler (kprobe.fault_handler will handle it). I guess that
>> can cause some problem with ftrace, isn't it? If so, I need
>> to deny a kprobe using ftrace if it has fault_handler.
>
> As long as there's recursion protection you are fine. In fact, I may add
> recursion protection within the assembly itself, that will make all
> function tracing safe. (does not solve the breakpoint bug from the other
> thread, but will solve most other things). In fact, this may allow us to
> remove notraces that were added because of recursion issues.

OK, I think kprobe already solves that as long as
get_kprobe and kprobe_running doesn't cause recursion...

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-06-04 12:07:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

On Mon, 2012-06-04 at 20:45 +0900, Masami Hiramatsu wrote:
> (2012/06/01 23:20), Steven Rostedt wrote:
> > On Fri, 2012-06-01 at 22:36 +0900, Masami Hiramatsu wrote:
> >
> >> OK, so I've introduced new noprobe tag and replaced __kprobes
> >> with it. And now __kprobes tag which is a combination of noprobe
> >> and notrace, means that the function is not probed and it can be
> >> called from kprobe handler. (thus user must use this with their
> >> handlers and functions which will be used from the handlers)
> >> And also most of __kprobes tags are replaced by noprobe only.
> >
> > You still haven't answered my question. Why can't function tracer still
> > trace these? If kprobes does not allow it to be probed, it should not
> > interfere with your code. But normal function tracing should still allow
> > these.
>
> Because those are called from ftrace-based kprobe, which means

But if it has kprobes annotation on it, you can't put a kprobe on it,
thus you can't kprobe function trace on it. But function tracing should
not interfere with kprobe tracing. You can trace these functions with
other function tracers without bothering kprobes. Function tracing
should handle this.

> it is directly invoked from kprobe_ftrace_handler.

So what? ftrace only calls the callbacks to functions that you ask to be
called back for. If you put a probe on scheduler, your call back will
only see the schedule function. It will not see the stack tracer tracing
timers and everything else. The two are agnostic.


> I think
> that should be handled as a part of ftrace handler.

NO! Why should we limit function tracing just because one of its users
can't call some functions?

How does this make it any different than what it is today? It's just an
optimization for kprobes. Being able to trace things that kprobes can
not, should cause no issue for kprobes. Why would NMI tracing affect
kprobes using function tracer?

Show me an example of a problem that you see.

> Currently, I just added notrace on below two kind of functions
>
> - handler functions which can be called intermediately from ftrace

Do they currently break ftrace?

> - get_kprobe, set_kprobe_instance, etc. internal utility functions
> which is called directly from kprobe ftrace handler.

Why can't we function trace these? Note, if we have two function
tracers, A and B. And A's handler calls something that B traces, it
should not be an issue, as long as B does not call something that A
traces.

Now if this does happen, it is trivial to add recursion protection in
the function tracer code itself. Such that B's handler will not be
called if A calls something B traces.

If we just add notrace all over the kernel to protect against this, soon
there will be nothing left to trace.


>
>
> > I still do not understand why you need to add 'notrace' at all.
>
> Because I'd like to solve a recursive call problem.
>
> I saw a problem which I hit some odd function tracer behavior.
> When I removed notrace from get_kprobe(), which is an essential
> internal function called directly from kprobe_ftrace_handler,
> I hit a kernel crash caused by recursive call right after I
> registered kprobe_ftrace_handler to ftrace. At that time,
> ftrace_ops.filter was empty so I thought there is no function
> traced, but the kprobe_ftrace_handler was called from somewhere.
> So I saw it hit a recursive loop of ftrace_call ->
> kprobe_ftrace_handler -> get_kprobe -> ftrace_call ...

OK, then the recursion protection I want to add will solve this. But why
would the ftrace_call call the kprobe_ftrace_handler? You should be
setting up the filter before registering the trace.

>
> I think if I just register kprobe's ftrace_ops without start
> tracing, I think we can just do tracing without "notrace".

You should be able to register the filter first, and then register
function tracer. If you register without setting the filter, you will
start tracing all functions.

I think that may be the problem you are seeing. You want to register the
filter first and then the ftrace_ops.

>
> >> This means that you can trace those by function tracer :)
> >>
> >> BTW, currently kprobes allows user cases pagefault in their
> >> handler (kprobe.fault_handler will handle it). I guess that
> >> can cause some problem with ftrace, isn't it? If so, I need
> >> to deny a kprobe using ftrace if it has fault_handler.
> >
> > As long as there's recursion protection you are fine. In fact, I may add
> > recursion protection within the assembly itself, that will make all
> > function tracing safe. (does not solve the breakpoint bug from the other
> > thread, but will solve most other things). In fact, this may allow us to
> > remove notraces that were added because of recursion issues.
>
> OK, I think kprobe already solves that as long as
> get_kprobe and kprobe_running doesn't cause recursion...

I think the issue is to register the filter first. But I still need to
prevent that recursion protection.

-- Steve

Subject: Re: [RFC PATCH -tip 0/9]ftrace, kprobes: Ftrace-based kprobe optimization

(2012/06/04 21:07), Steven Rostedt wrote:
> On Mon, 2012-06-04 at 20:45 +0900, Masami Hiramatsu wrote:
>> I think if I just register kprobe's ftrace_ops without start
>> tracing, I think we can just do tracing without "notrace".
>
> You should be able to register the filter first, and then register
> function tracer. If you register without setting the filter, you will
> start tracing all functions.
>
> I think that may be the problem you are seeing. You want to register the
> filter first and then the ftrace_ops.

Ah, that's all what I need to know! :)
OK, I'll do that for avoiding recursion problem.

Thanks!


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: [RFC PATCH -tip 1/9] ftrace: Add pt_regs acceptable trace callback

(2012/06/02 11:07), Steven Rostedt wrote:
> On Tue, 2012-05-29 at 21:48 +0900, Masami Hiramatsu wrote:
>
>> struct ftrace_ops {
>> - ftrace_func_t func;
>> + union {
>> + ftrace_func_t func;
>> + ftrace_regs_func_t regs_func;
>> + };
>> struct ftrace_ops *next;
>> unsigned long flags;
>> int __percpu *disabled;
>> @@ -164,6 +182,7 @@ static inline int ftrace_function_local_disabled(struct ftrace_ops *ops)
>> }
>
> [..]
>
>>
>> static struct ftrace_ops global_ops = {
>> - .func = ftrace_stub,
>> + .regs_func = ftrace_regs_stub,
>> .notrace_hash = EMPTY_HASH,
>> .filter_hash = EMPTY_HASH,
>> + .flags = FTRACE_OPS_FL_SAVE_REGS,
>> };
>>
>> static DEFINE_MUTEX(ftrace_regex_lock);
>> @@ -3911,7 +3924,8 @@ void __init ftrace_init(void)
>> #else
>>
>> static struct ftrace_ops global_ops = {
>> - .func = ftrace_stub,
>> + .regs_func = ftrace_regs_stub,
>> + .flags = FTRACE_OPS_FL_SAVE_REGS,
>> };
>>
>
> Ug, this wont compile with some versions of gcc :-(
>
> The one I stumbled on is gcc 4.5.1 (which I test builds against 4.5.1
> and 4.6.0). Then I saw this BZ:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676
>
> This can't be a union :-( Then we can not initialize it.

Hmm, how about initializing in __init function ?
Or we can make func and regs_func in different members,
instead of using a union. (in that case, we can remove
FTRACE_OPS_FL_SAVE_REGS.)
I just consider passing uninitialized argument to user
function can cause unexpected behavior...

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-06-04 14:25:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: [RFC PATCH -tip 1/9] ftrace: Add pt_regs acceptable trace callback

On Mon, 2012-06-04 at 22:58 +0900, Masami Hiramatsu wrote:

> Hmm, how about initializing in __init function ?
> Or we can make func and regs_func in different members,
> instead of using a union. (in that case, we can remove
> FTRACE_OPS_FL_SAVE_REGS.)
> I just consider passing uninitialized argument to user
> function can cause unexpected behavior...

Easy solution:

As I want all functions to pass the ftrace_ops anyway, we can implement
the ftrace_ops and the regs passing together. The arch will need to
update both at the same time.

But for archs that do not support ftrace_ops (and thus also not regs),
we can do (and I will do this):

static inline void
__global_list_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct pt_regs *regs)
{
[do the loop, ignoring ops anyway, but passing in regs]
}

#ifndef ARCH_SUPPORTS_FTRACE_OPS
static void
noregs_global_list_func(unsigned long ip, unsigned long parent_ip)
{
__global_list_func(ip, parent_ip, NULL, NULL);
}
#define global_list_func (ftrace_func_t)noregs_global_list_func
#else
static void global_list_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct pt_regs *regs)
{
__global_list_func(ip, parent_ip, ops, regs);
}
#endif



Nothing will be passed uninitialized. If an arch does not support
passing ftrace ops and regs, then it will be calling the
noregs_global_list_func() (or whatever I name it), which only expects
the ip and parent_ip as parameters. Then that will be calling the actual
loop function with NULLs in the parameters.

When an arch supports passing of ftrace_ops, then it should also support
passing in the regs (as that should be the trivial part).

Note, all funcs will get regs, but it may not get the full regs. That
requires the ftrace_ops setting the special flag. The regs are saved for
the mcount already. But only a partial set. Those will be sent to all
function callbacks. If the function call back requires a full set (like
kprobes does), then it must set the flag before registering.

Hows this sound?

-- Steve


Subject: Re: Re: Re: [RFC PATCH -tip 1/9] ftrace: Add pt_regs acceptable trace callback

(2012/06/04 23:25), Steven Rostedt wrote:
> On Mon, 2012-06-04 at 22:58 +0900, Masami Hiramatsu wrote:
>
>> Hmm, how about initializing in __init function ?
>> Or we can make func and regs_func in different members,
>> instead of using a union. (in that case, we can remove
>> FTRACE_OPS_FL_SAVE_REGS.)
>> I just consider passing uninitialized argument to user
>> function can cause unexpected behavior...
>
> Easy solution:
>
> As I want all functions to pass the ftrace_ops anyway, we can implement
> the ftrace_ops and the regs passing together. The arch will need to
> update both at the same time.

Hmm, is that ftrace_ops for recursion check? :)

>
> But for archs that do not support ftrace_ops (and thus also not regs),
> we can do (and I will do this):
>
> static inline void
> __global_list_func(unsigned long ip, unsigned long parent_ip,
> struct ftrace_ops *ops, struct pt_regs *regs)
> {
> [do the loop, ignoring ops anyway, but passing in regs]
> }
>
> #ifndef ARCH_SUPPORTS_FTRACE_OPS
> static void
> noregs_global_list_func(unsigned long ip, unsigned long parent_ip)
> {
> __global_list_func(ip, parent_ip, NULL, NULL);
> }
> #define global_list_func (ftrace_func_t)noregs_global_list_func
> #else
> static void global_list_func(unsigned long ip, unsigned long parent_ip,
> struct ftrace_ops *ops, struct pt_regs *regs)
> {
> __global_list_func(ip, parent_ip, ops, regs);
> }
> #endif
>
>
>
> Nothing will be passed uninitialized. If an arch does not support
> passing ftrace ops and regs, then it will be calling the
> noregs_global_list_func() (or whatever I name it), which only expects
> the ip and parent_ip as parameters. Then that will be calling the actual
> loop function with NULLs in the parameters.

Yeah, that's safe, but I think dyn_ftrace can't call handler
directly from ftrace_call on such archs, can it?
# It depends on performance degradation.

> When an arch supports passing of ftrace_ops, then it should also support
> passing in the regs (as that should be the trivial part).

Preparing pt_regs by software is hard on some archs, e.g. IA64.
But yeah, that's an obsolete arch. We'd better focus on x86 and
ARM variants.

> Note, all funcs will get regs, but it may not get the full regs. That
> requires the ftrace_ops setting the special flag. The regs are saved for
> the mcount already. But only a partial set. Those will be sent to all
> function callbacks. If the function call back requires a full set (like
> kprobes does), then it must set the flag before registering.

Just out of curiously, would you mean that you will allocate full
pt_regs frame on stack always?

>
> Hows this sound?

Sounds better to me, at far as there are non-initialized parameters
passed to user handler. :)

BTW, would you like to update ftrace part? I'd like to fix to remove
notrace from my previous patchset and resend tomorrow.

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2012-06-04 15:11:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: Re: Re: [RFC PATCH -tip 1/9] ftrace: Add pt_regs acceptable trace callback

On Mon, 2012-06-04 at 23:57 +0900, Masami Hiramatsu wrote:

> > As I want all functions to pass the ftrace_ops anyway, we can implement
> > the ftrace_ops and the regs passing together. The arch will need to
> > update both at the same time.
>
> Hmm, is that ftrace_ops for recursion check? :)

Nope, it's to pass data to the callbacks.


> > Nothing will be passed uninitialized. If an arch does not support
> > passing ftrace ops and regs, then it will be calling the
> > noregs_global_list_func() (or whatever I name it), which only expects
> > the ip and parent_ip as parameters. Then that will be calling the actual
> > loop function with NULLs in the parameters.
>
> Yeah, that's safe, but I think dyn_ftrace can't call handler
> directly from ftrace_call on such archs, can it?
> # It depends on performance degradation.
>

Actually, there's already a performance degradation on most archs. They
must also implement a fast "shut off function tracing", and if it
doesn't, it calls a helper function. I'll merge the helper function into
this call so we don't have two indirections.


> > When an arch supports passing of ftrace_ops, then it should also support
> > passing in the regs (as that should be the trivial part).
>
> Preparing pt_regs by software is hard on some archs, e.g. IA64.
> But yeah, that's an obsolete arch. We'd better focus on x86 and
> ARM variants.

It shouldn't be hard to pass a forth arg. Then if it's too hard to pass
regs, it can still pass NULL. We can just make all callbacks check for
NULL. I can add an config option that tests to make sure this is the
case. If the regs flag isn't set, I can call the call back with NULL in
the regs, and make it crash. Obviously this config will be a a debug
config and not something normal users should set. :-)

>
> > Note, all funcs will get regs, but it may not get the full regs. That
> > requires the ftrace_ops setting the special flag. The regs are saved for
> > the mcount already. But only a partial set. Those will be sent to all
> > function callbacks. If the function call back requires a full set (like
> > kprobes does), then it must set the flag before registering.
>
> Just out of curiously, would you mean that you will allocate full
> pt_regs frame on stack always?

Yes. The allocation is done, but the actual storage is not. This could
cause cache issues, but nothing too bad. If you are worried about stack
overflow, then the code itself is using too much stack and should be
reported. Of course, that's what the stack tracer is for ;-)

>
> >
> > Hows this sound?
>
> Sounds better to me, at far as there are non-initialized parameters
> passed to user handler. :)
>
> BTW, would you like to update ftrace part? I'd like to fix to remove
> notrace from my previous patchset and resend tomorrow.

I'll work today on getting out a new git tree that has my latest
updates.

Thanks!

-- Steve