2015-02-13 05:50:30

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 00/26] Early kprobe: enable kprobes at very early booting stage.

I fell very sorry for people who reviewed my v2 patch series yesterday
at https://lkml.org/lkml/2015/2/12/234 because I didn't provide enough
information in commit log. This v3 patch series add those missing
commit messages. There are also 2 small fix based on v2:

1. Fixes ftrace_sort_mcount_area. Original patch doesn't work for module.
2. Wraps setting of kprobes_initialized in stop_machine() context.

Wang Nan (26):
kprobes: set kprobes_all_disarmed earlier to enable re-optimization.
kprobes: makes kprobes/enabled works correctly for optimized kprobes.
kprobes: x86: mark 2 bytes NOP as boostable.
ftrace: don't update record flags if code modification fail.
ftrace/x86: Ensure rec->flags no change when failure occures.
ftrace: sort ftrace entries earlier.
ftrace: allow search ftrace addr before ftrace fully inited.
ftrace: enable make ftrace nop before ftrace_init().
ftrace: allow fixing code update failure by notifier chain.
ftrace: x86: try to fix ftrace when ftrace_replace_code.
early kprobes: introduce kprobe_is_early for futher early kprobe use.
early kprobes: Add an KPROBE_FLAG_EARLY for early kprobe.
early kprobes: ARM: directly modify code.
early kprobes: ARM: introduce early kprobes related code area.
early kprobes: x86: directly modify code.
early kprobes: x86: introduce early kprobes related code area.
early kprobes: introduces macros for allocing early kprobe resources.
early kprobes: allows __alloc_insn_slot() from early kprobes slots.
early kprobes: perhibit probing at early kprobe reserved area.
early kprobes: core logic of eraly kprobes.
early kprobes: add CONFIG_EARLY_KPROBES option.
early kprobes: introduce arch_fix_ftrace_early_kprobe().
early kprobes: x86: arch_restore_optimized_kprobe().
early kprobes: core logic to support early kprobe on ftrace.
early kprobes: introduce kconfig option to support early kprobe on
ftrace.
kprobes: enable 'ekprobe=' cmdline option for early kprobes.

arch/Kconfig | 15 ++
arch/arm/include/asm/kprobes.h | 31 ++-
arch/arm/kernel/vmlinux.lds.S | 2 +
arch/arm/probes/kprobes/opt-arm.c | 12 +-
arch/x86/include/asm/insn.h | 7 +-
arch/x86/include/asm/kprobes.h | 47 +++-
arch/x86/kernel/ftrace.c | 23 +-
arch/x86/kernel/kprobes/core.c | 2 +-
arch/x86/kernel/kprobes/opt.c | 69 +++++-
arch/x86/kernel/vmlinux.lds.S | 2 +
include/linux/ftrace.h | 37 +++
include/linux/kprobes.h | 132 +++++++++++
init/main.c | 1 +
kernel/kprobes.c | 479 +++++++++++++++++++++++++++++++++++++-
kernel/trace/ftrace.c | 157 +++++++++++--
15 files changed, 969 insertions(+), 47 deletions(-)

--
1.8.4


2015-02-13 05:51:53

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 01/26] kprobes: set kprobes_all_disarmed earlier to enable re-optimization.

In original code, the probed instruction doesn't get optimized after

echo 0 > /sys/kernel/debug/kprobes/enabled
echo 1 > /sys/kernel/debug/kprobes/enabled

This is because original code checks kprobes_all_disarmed in
optimize_kprobe(), but this flag is turned off after calling that
function. Therefore, optimize_kprobe() will see
kprobes_all_disarmed == true and doesn't do the optimization.

This patch simply turns off kprobes_all_disarmed earlier to enable
optimization.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
kernel/kprobes.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2ca272f..c397900 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2320,6 +2320,12 @@ static void arm_all_kprobes(void)
if (!kprobes_all_disarmed)
goto already_enabled;

+ /*
+ * optimize_kprobe() called by arm_kprobe() checks
+ * kprobes_all_disarmed, so set kprobes_all_disarmed before
+ * arm_kprobe.
+ */
+ kprobes_all_disarmed = false;
/* Arming kprobes doesn't optimize kprobe itself */
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
@@ -2328,7 +2334,6 @@ static void arm_all_kprobes(void)
arm_kprobe(p);
}

- kprobes_all_disarmed = false;
printk(KERN_INFO "Kprobes globally enabled\n");

already_enabled:
--
1.8.4

2015-02-13 05:52:13

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 02/26] kprobes: makes kprobes/enabled works correctly for optimized kprobes.

debugfs/kprobes/enabled doesn't work correctly on optimized kprobes.
Masami Hiramatsu has a test report on x86_64 platform:

https://lkml.org/lkml/2015/1/19/274

This patch forces it to unoptimize kprobe if kprobes_all_disarmed
is set. It also checks the flag in unregistering path for skipping
unneeded disarming process when kprobes globally disarmed.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
kernel/kprobes.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c397900..c90e417 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -869,7 +869,8 @@ static void __disarm_kprobe(struct kprobe *p, bool reopt)
{
struct kprobe *_p;

- unoptimize_kprobe(p, false); /* Try to unoptimize */
+ /* Try to unoptimize */
+ unoptimize_kprobe(p, kprobes_all_disarmed);

if (!kprobe_queued(p)) {
arch_disarm_kprobe(p);
@@ -1571,7 +1572,13 @@ static struct kprobe *__disable_kprobe(struct kprobe *p)

/* Try to disarm and disable this/parent probe */
if (p == orig_p || aggr_kprobe_disabled(orig_p)) {
- disarm_kprobe(orig_p, true);
+ /*
+ * If kprobes_all_disarmed is set, orig_p
+ * should have already been disarmed, so
+ * skip unneed disarming process.
+ */
+ if (!kprobes_all_disarmed)
+ disarm_kprobe(orig_p, true);
orig_p->flags |= KPROBE_FLAG_DISABLED;
}
}
--
1.8.4

2015-02-13 05:48:58

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 03/26] kprobes: x86: mark 2 bytes NOP as boostable.

Currently, x86 kprobes is unable to boost 2 bytes nop like:

nopl 0x0(%rax,%rax,1)

which is 0x0f 0x1f 0x44 0x00 0x00.

Such nops have exactly 5 bytes which is able to hold a relative jmp
instruction. Boosting them should be obviously safe.

This patch enable boosting such nops by simply updating
twobyte_is_boostable[] array.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 98f654d..6a1146e 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -84,7 +84,7 @@ static volatile u32 twobyte_is_boostable[256 / 32] = {
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
/* ---------------------------------------------- */
W(0x00, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0) | /* 00 */
- W(0x10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 10 */
+ W(0x10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) , /* 10 */
W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 20 */
W(0x30, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 30 */
W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
--
1.8.4

2015-02-13 05:51:51

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 04/26] ftrace: don't update record flags if code modification fail.

X86 and common ftrace_replace_code() behave differently.

In x86, rec->flags get updated only when (almost) all works are done. In
common code, rec->flags is updated before code modification, and never
get restored when code modification fails.

This patch ensures rec->flags kept its original value if
ftrace_replace_code() fail. A later patch will correct that function
for x86.

Signed-off-by: Wang Nan <[email protected]>
---
kernel/trace/ftrace.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 45e5cb1..6c6cbb1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2254,23 +2254,30 @@ __ftrace_replace_code(struct dyn_ftrace *rec, int enable)
/* This needs to be done before we call ftrace_update_record */
ftrace_old_addr = ftrace_get_addr_curr(rec);

- ret = ftrace_update_record(rec, enable);
+ ret = ftrace_test_record(rec, enable);

switch (ret) {
case FTRACE_UPDATE_IGNORE:
return 0;

case FTRACE_UPDATE_MAKE_CALL:
- return ftrace_make_call(rec, ftrace_addr);
+ ret = ftrace_make_call(rec, ftrace_addr);
+ break;

case FTRACE_UPDATE_MAKE_NOP:
- return ftrace_make_nop(NULL, rec, ftrace_old_addr);
+ ret = ftrace_make_nop(NULL, rec, ftrace_old_addr);
+ break;

case FTRACE_UPDATE_MODIFY_CALL:
- return ftrace_modify_call(rec, ftrace_old_addr, ftrace_addr);
+ ret = ftrace_modify_call(rec, ftrace_old_addr, ftrace_addr);
+ break;
}

- return -1; /* unknow ftrace bug */
+ if (ret)
+ return -1; /* unknow ftrace bug */
+
+ ftrace_update_record(rec, enable);
+ return 0;
}

void __weak ftrace_replace_code(int enable)
--
1.8.4

2015-02-13 05:52:09

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 05/26] ftrace/x86: Ensure rec->flags no change when failure occures.

Don't change rec->flags if code modification fails.

Signed-off-by: Wang Nan <[email protected]>
---
arch/x86/kernel/ftrace.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8b7b0a5..7bdba65 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -497,6 +497,7 @@ static int finish_update(struct dyn_ftrace *rec, int enable)
{
unsigned long ftrace_addr;
int ret;
+ unsigned long old_flags = rec->flags;

ret = ftrace_update_record(rec, enable);

@@ -509,14 +510,18 @@ static int finish_update(struct dyn_ftrace *rec, int enable)
case FTRACE_UPDATE_MODIFY_CALL:
case FTRACE_UPDATE_MAKE_CALL:
/* converting nop to call */
- return finish_update_call(rec, ftrace_addr);
+ ret = finish_update_call(rec, ftrace_addr);
+ break;

case FTRACE_UPDATE_MAKE_NOP:
/* converting a call to a nop */
- return finish_update_nop(rec);
+ ret = finish_update_nop(rec);
+ break;
}

- return 0;
+ if (ret)
+ rec->flags = old_flags;
+ return ret;
}

static void do_sync_core(void *data)
--
1.8.4

2015-02-13 05:48:56

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 06/26] ftrace: sort ftrace entries earlier.

By extracting mcount sorting code and sort them earliler, futher patches
will be able to determine whether an address is on an ftrace entry or
not using bsearch().

ftrace_sort_mcount_area() will be called before, during and after
ftrace_init (when module insertion). Ensure it sort kernel mcount table
only once.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/ftrace.h | 2 ++
init/main.c | 1 +
kernel/trace/ftrace.c | 38 ++++++++++++++++++++++++++++++++++++--
3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 1da6029..8db315a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -701,8 +701,10 @@ static inline void __ftrace_enabled_restore(int enabled)

#ifdef CONFIG_FTRACE_MCOUNT_RECORD
extern void ftrace_init(void);
+extern void ftrace_init_early(void);
#else
static inline void ftrace_init(void) { }
+static inline void ftrace_init_early(void) { }
#endif

/*
diff --git a/init/main.c b/init/main.c
index 6f0f1c5f..eaafc3e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -517,6 +517,7 @@ asmlinkage __visible void __init start_kernel(void)
boot_cpu_init();
page_address_init();
pr_notice("%s", linux_banner);
+ ftrace_init_early();
setup_arch(&command_line);
mm_init_cpumask(&init_mm);
setup_command_line(command_line);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 6c6cbb1..a75cfbe 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1169,6 +1169,7 @@ struct ftrace_page {

static struct ftrace_page *ftrace_pages_start;
static struct ftrace_page *ftrace_pages;
+static bool kernel_mcount_sorted = false;

static bool __always_inline ftrace_hash_empty(struct ftrace_hash *hash)
{
@@ -4743,6 +4744,32 @@ static void ftrace_swap_ips(void *a, void *b, int size)
*ipb = t;
}

+static void ftrace_sort_mcount_area(unsigned long *start, unsigned long *end)
+{
+ extern unsigned long __start_mcount_loc[];
+ extern unsigned long __stop_mcount_loc[];
+
+ unsigned long count;
+ bool is_kernel_mcount;
+
+ count = end - start;
+ if (!count)
+ return;
+
+ is_kernel_mcount =
+ (start == __start_mcount_loc) &&
+ (end == __stop_mcount_loc);
+
+ if (is_kernel_mcount && kernel_mcount_sorted)
+ return;
+
+ sort(start, count, sizeof(*start),
+ ftrace_cmp_ips, ftrace_swap_ips);
+
+ if (is_kernel_mcount)
+ kernel_mcount_sorted = true;
+}
+
static int ftrace_process_locs(struct module *mod,
unsigned long *start,
unsigned long *end)
@@ -4761,8 +4788,7 @@ static int ftrace_process_locs(struct module *mod,
if (!count)
return 0;

- sort(start, count, sizeof(*start),
- ftrace_cmp_ips, ftrace_swap_ips);
+ ftrace_sort_mcount_area(start, end);

start_pg = ftrace_allocate_pages(count);
if (!start_pg)
@@ -4965,6 +4991,14 @@ void __init ftrace_init(void)
ftrace_disabled = 1;
}

+void __init ftrace_init_early(void)
+{
+ extern unsigned long __start_mcount_loc[];
+ extern unsigned long __stop_mcount_loc[];
+
+ ftrace_sort_mcount_area(__start_mcount_loc, __stop_mcount_loc);
+}
+
/* Do nothing if arch does not support this */
void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
{
--
1.8.4

2015-02-13 05:48:55

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 07/26] ftrace: allow search ftrace addr before ftrace fully inited.

This patch enables ftrace_location() to be used before ftrace_init().
The first user should be early kprobes, which can insert kprobes to
kernel code even before setup_arch() finishes. This patch gives it a
chance to determine whether it is probing ftrace entries and allows it
do some special treatment.

ftrace_cmp_ips_insn() is introduced to make early ftrace_location()
behavior consistent with normal ftrace_location(). With existing
ftrace_cmp_ips(), searching an address in middle of an instruction will
fail, which is inconsistent with ftrace_cmp_recs() used by normal
ftrace_location().

With this and previous patch ftrace_location() now is able to be called
in and after setup_arch().

Signed-off-by: Wang Nan <[email protected]>
---
kernel/trace/ftrace.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a75cfbe..fc0c1aa 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1539,6 +1539,8 @@ static unsigned long ftrace_location_range(unsigned long start, unsigned long en
return 0;
}

+static unsigned long ftrace_search_mcount_ip(unsigned long ip);
+
/**
* ftrace_location - return true if the ip giving is a traced location
* @ip: the instruction pointer to check
@@ -1550,6 +1552,9 @@ static unsigned long ftrace_location_range(unsigned long start, unsigned long en
*/
unsigned long ftrace_location(unsigned long ip)
{
+ if (unlikely(!ftrace_pages_start))
+ return ftrace_search_mcount_ip(ip);
+
return ftrace_location_range(ip, ip);
}

@@ -4733,6 +4738,18 @@ static int ftrace_cmp_ips(const void *a, const void *b)
return 0;
}

+static int ftrace_cmp_ips_insn(const void *a, const void *b)
+{
+ const unsigned long *ipa = a;
+ const unsigned long *ipb = b;
+
+ if (*ipa >= *ipb + MCOUNT_INSN_SIZE)
+ return 1;
+ if (*ipa < *ipb)
+ return -1;
+ return 0;
+}
+
static void ftrace_swap_ips(void *a, void *b, int size)
{
unsigned long *ipa = a;
@@ -4770,6 +4787,27 @@ static void ftrace_sort_mcount_area(unsigned long *start, unsigned long *end)
kernel_mcount_sorted = true;
}

+static unsigned long ftrace_search_mcount_ip(unsigned long ip)
+{
+ extern unsigned long __start_mcount_loc[];
+ extern unsigned long __stop_mcount_loc[];
+
+ unsigned long *mcount_start = __start_mcount_loc;
+ unsigned long *mcount_end = __stop_mcount_loc;
+ unsigned long count = mcount_end - mcount_start;
+ unsigned long *retval;
+
+ if (!kernel_mcount_sorted)
+ return 0;
+
+ retval = bsearch(&ip, mcount_start, count,
+ sizeof(unsigned long), ftrace_cmp_ips_insn);
+ if (!retval)
+ return 0;
+
+ return ftrace_call_adjust(ip);
+}
+
static int ftrace_process_locs(struct module *mod,
unsigned long *start,
unsigned long *end)
--
1.8.4

2015-02-13 05:49:13

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 08/26] ftrace: enable make ftrace nop before ftrace_init().

This patch is for early kprobes.

Ftrace converts ftrace entries to nop when init, which will conflict
with early kprobes if it probe on an ftrace entry before such
conversion. For x86, ftrace entries is 'call' instruction which is
happends unboostable.

This patch provides ftrace_process_loc_early() to allow early kprobes to
convert target instruction before ftrace_init() is called. Only allows
ftrace_process_loc_early() called before ftrace_init().

However, for x86 only this patch is not enough. Due to ideal_nop() is
updated during setup_arch(), we are unable to ensure
ftrace_process_loc_early() choose similar nop as normal ftrace. I'll use
another mechanism to solve this problem.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/ftrace.h | 5 +++++
kernel/trace/ftrace.c | 18 ++++++++++++++++++
2 files changed, 23 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8db315a..d37ccd8a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -702,9 +702,14 @@ static inline void __ftrace_enabled_restore(int enabled)
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
extern void ftrace_init(void);
extern void ftrace_init_early(void);
+extern int ftrace_process_loc_early(unsigned long ip);
#else
static inline void ftrace_init(void) { }
static inline void ftrace_init_early(void) { }
+static inline int ftrace_process_loc_early(unsigned long __unused)
+{
+ return 0;
+}
#endif

/*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fc0c1aa..e39e72a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -5037,6 +5037,24 @@ void __init ftrace_init_early(void)
ftrace_sort_mcount_area(__start_mcount_loc, __stop_mcount_loc);
}

+int __init ftrace_process_loc_early(unsigned long addr)
+{
+ unsigned long ip;
+ struct dyn_ftrace fake_rec;
+ int ret;
+
+ BUG_ON(ftrace_pages_start);
+
+ ip = ftrace_location(addr);
+ if (ip != addr)
+ return -EINVAL;
+
+ memset(&fake_rec, '\0', sizeof(fake_rec));
+ fake_rec.ip = ip;
+ ret = ftrace_make_nop(NULL, &fake_rec, MCOUNT_ADDR);
+ return ret;
+}
+
/* Do nothing if arch does not support this */
void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
{
--
1.8.4

2015-02-13 05:49:14

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 09/26] ftrace: allow fixing code update failure by notifier chain.

This patch introduces a notifier chain (ftrace_update_notifier_list) and
ftrace_tryfix_bug(). The goal of this patch is to provide other
subsystem a chance to fix code if they alert ftrace entries before
ftrace_init().

Such subsystems should register a callback with
register_ftrace_update_notifier(). Ftrace will trigger the callback by
ftrace_tryfix_bug() when it fail to alert ftrace entries, instead of
directly fire an ftrace_bug(). It wrapps failure information with
a struct ftrace_update_notifier_info. Subscriber is able to determine
what it trying to do with it.

Subscriber of that notifier chain should return NOTIFY_STOP if it can
deal with the problem, or NOTIFY_DONE to pass it to other. By setting
info->retry it can inform ftrace to retry faild operation.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/ftrace.h | 30 ++++++++++++++++++++++++++++++
kernel/trace/ftrace.c | 46 ++++++++++++++++++++++++++++++++++++++++------
2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d37ccd8a..98da86d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -283,6 +283,21 @@ int ftrace_arch_code_modify_post_process(void);
struct dyn_ftrace;

void ftrace_bug(int err, struct dyn_ftrace *rec);
+int ftrace_tryfix(int failed, int enable, struct dyn_ftrace *rec);
+
+#define __ftrace_tryfix_bug(__failed, __enable, __rec, __retry, __trigger)\
+ ({ \
+ int __fix_ret = ftrace_tryfix((__failed), (__enable), (__rec));\
+ __fix_ret = (__fix_ret == -EAGAIN) ? \
+ ({ __retry; }) : \
+ __fix_ret; \
+ if (__fix_ret && (__trigger)) \
+ ftrace_bug(__failed, __rec); \
+ __fix_ret; \
+ })
+
+#define ftrace_tryfix_bug(__failed, __enable, __rec, __retry) \
+ __ftrace_tryfix_bug(__failed, __enable, __rec, __retry, true)

struct seq_file;

@@ -699,10 +714,20 @@ static inline void __ftrace_enabled_restore(int enabled)
# define trace_preempt_off(a0, a1) do { } while (0)
#endif

+struct ftrace_update_notifier_info {
+ struct dyn_ftrace *rec;
+ int errno;
+ int enable;
+
+ /* Filled by subscriber */
+ bool retry;
+};
+
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
extern void ftrace_init(void);
extern void ftrace_init_early(void);
extern int ftrace_process_loc_early(unsigned long ip);
+extern int register_ftrace_update_notifier(struct notifier_block *nb);
#else
static inline void ftrace_init(void) { }
static inline void ftrace_init_early(void) { }
@@ -710,6 +735,11 @@ static inline int ftrace_process_loc_early(unsigned long __unused)
{
return 0;
}
+
+static inline int register_ftrace_update_notifier(struct notifier_block *__unused)
+{
+ return 0;
+}
#endif

/*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index e39e72a..d75b823 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -112,6 +112,7 @@ ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
ftrace_func_t ftrace_pid_function __read_mostly = ftrace_stub;
static struct ftrace_ops global_ops;
static struct ftrace_ops control_ops;
+static ATOMIC_NOTIFIER_HEAD(ftrace_update_notifier_list);

static void ftrace_ops_recurs_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct pt_regs *regs);
@@ -1971,6 +1972,28 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec)
}
}

+int ftrace_tryfix(int failed, int enable, struct dyn_ftrace *rec)
+{
+ int notify_result = NOTIFY_DONE;
+ struct ftrace_update_notifier_info info = {
+ .rec = rec,
+ .errno = failed,
+ .enable = enable,
+ .retry = false,
+ };
+
+ notify_result = atomic_notifier_call_chain(
+ &ftrace_update_notifier_list,
+ 0, &info);
+
+ if (notify_result != NOTIFY_STOP)
+ return failed;
+
+ if (info.retry)
+ return -EAGAIN;
+ return 0;
+}
+
static int ftrace_check_record(struct dyn_ftrace *rec, int enable, int update)
{
unsigned long flag = 0UL;
@@ -2298,9 +2321,12 @@ void __weak ftrace_replace_code(int enable)
do_for_each_ftrace_rec(pg, rec) {
failed = __ftrace_replace_code(rec, enable);
if (failed) {
- ftrace_bug(failed, rec);
- /* Stop processing */
- return;
+ failed = ftrace_tryfix_bug(failed, enable, rec,
+ __ftrace_replace_code(rec, enable));
+
+ /* Stop processing if still fail */
+ if (failed)
+ return;
}
} while_for_each_ftrace_rec();
}
@@ -2387,8 +2413,10 @@ ftrace_code_disable(struct module *mod, struct dyn_ftrace *rec)

ret = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
if (ret) {
- ftrace_bug(ret, rec);
- return 0;
+ ret = ftrace_tryfix_bug(ret, 0, rec,
+ ftrace_make_nop(mod, rec, MCOUNT_ADDR));
+ if (ret)
+ return 0;
}
return 1;
}
@@ -2844,7 +2872,8 @@ static int ftrace_update_code(struct module *mod, struct ftrace_page *new_pgs)
if (ftrace_start_up && cnt) {
int failed = __ftrace_replace_code(p, 1);
if (failed)
- ftrace_bug(failed, p);
+ failed = ftrace_tryfix_bug(failed, 1, p,
+ __ftrace_replace_code(p, 1));
}
}
}
@@ -5673,6 +5702,11 @@ ftrace_enable_sysctl(struct ctl_table *table, int write,
return ret;
}

+int register_ftrace_update_notifier(struct notifier_block *nb)
+{
+ return atomic_notifier_chain_register(&ftrace_update_notifier_list, nb);
+}
+
#ifdef CONFIG_FUNCTION_GRAPH_TRACER

static struct ftrace_ops graph_ops = {
--
1.8.4

2015-02-13 05:52:36

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 10/26] ftrace: x86: try to fix ftrace when ftrace_replace_code.

For ftrace x86, when ftrace_replace_code(), if it failed to add
breakpoint, trigger a bugfix trying instead of ftrace_bug().

Only give one chance for fixing at add_breakpoints(). If it fails at
other stage, bug directly.

Signed-off-by: Wang Nan <[email protected]>
---
arch/x86/kernel/ftrace.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 7bdba65..c869138 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -553,8 +553,16 @@ void ftrace_replace_code(int enable)
rec = ftrace_rec_iter_record(iter);

ret = add_breakpoints(rec, enable);
- if (ret)
- goto remove_breakpoints;
+ if (ret) {
+ /*
+ * Don't trigger ftrace_bug here. Let it done by
+ * remove_breakpoints procedure.
+ */
+ ret = __ftrace_tryfix_bug(ret, enable, rec,
+ add_breakpoints(rec, enable), false);
+ if (ret)
+ goto remove_breakpoints;
+ }
count++;
}

--
1.8.4

2015-02-13 05:50:17

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 11/26] early kprobes: introduce kprobe_is_early for futher early kprobe use.

Following early kprobe patches will enable kprobe registering very
early, even before kprobe system initialized. kprobe_is_early() can be
used to check whether we are working on early kprobe.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/kprobes.h | 2 ++
kernel/kprobes.c | 6 ++++++
2 files changed, 8 insertions(+)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 1ab5475..e1c8307 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -50,6 +50,8 @@
#define KPROBE_REENTER 0x00000004
#define KPROBE_HIT_SSDONE 0x00000008

+extern int kprobes_is_early(void);
+
#else /* CONFIG_KPROBES */
typedef int kprobe_opcode_t;
struct arch_specific_insn {
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c90e417..647c95a 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -68,6 +68,12 @@
#endif

static int kprobes_initialized;
+
+int kprobes_is_early(void)
+{
+ return !kprobes_initialized;
+}
+
static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];

--
1.8.4

2015-02-13 05:52:11

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 12/26] early kprobes: Add an KPROBE_FLAG_EARLY for early kprobe.

Introduce a KPROBE_FLAG_EARLY for futher using. KPROBE_FLAG_EARLY
indicates a kprobe is installed at very early stage, its resources
should be allocated statically.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/kprobes.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e1c8307..8d2e754 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -130,6 +130,7 @@ struct kprobe {
* this flag is only for optimized_kprobe.
*/
#define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */
+#define KPROBE_FLAG_EARLY 16 /* early kprobe */

/* Has this kprobe gone ? */
static inline int kprobe_gone(struct kprobe *p)
--
1.8.4

2015-02-13 05:49:11

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 13/26] early kprobes: ARM: directly modify code.

For early kprobe, we can simply patch text because we are in a relative
simple environment.

Signed-off-by: Wang Nan <[email protected]>
---
arch/arm/probes/kprobes/opt-arm.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm/probes/kprobes/opt-arm.c b/arch/arm/probes/kprobes/opt-arm.c
index bcdecc2..43446df 100644
--- a/arch/arm/probes/kprobes/opt-arm.c
+++ b/arch/arm/probes/kprobes/opt-arm.c
@@ -330,8 +330,18 @@ void __kprobes arch_optimize_kprobes(struct list_head *oplist)
* Similar to __arch_disarm_kprobe, operations which
* removing breakpoints must be wrapped by stop_machine
* to avoid racing.
+ *
+ * If this function is called before kprobes initialized,
+ * the kprobe should be an early kprobe, the instruction
+ * is not armed with breakpoint. There should be only
+ * one core now, so directly __patch_text is enough.
*/
- kprobes_remove_breakpoint(op->kp.addr, insn);
+ if (unlikely(kprobes_is_early())) {
+ BUG_ON(!(op->kp.flags & KPROBE_FLAG_EARLY));
+ __patch_text(op->kp.addr, insn);
+ } else {
+ kprobes_remove_breakpoint(op->kp.addr, insn);
+ }

list_del_init(&op->list);
}
--
1.8.4

2015-02-13 05:49:10

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 14/26] early kprobes: ARM: introduce early kprobes related code area.

In arm's vmlinux.lds, introduces code area inside text section.
Executable area used by early kprobes will be allocated from there.

Signed-off-by: Wang Nan <[email protected]>
---
arch/arm/include/asm/kprobes.h | 31 +++++++++++++++++++++++++++++--
arch/arm/kernel/vmlinux.lds.S | 2 ++
2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h
index 3ea9be5..0a4421e 100644
--- a/arch/arm/include/asm/kprobes.h
+++ b/arch/arm/include/asm/kprobes.h
@@ -17,16 +17,42 @@
#define _ARM_KPROBES_H

#include <linux/types.h>
-#include <linux/ptrace.h>
-#include <linux/notifier.h>

#define __ARCH_WANT_KPROBES_INSN_SLOT
#define MAX_INSN_SIZE 2

+#ifdef __ASSEMBLY__
+
+#define KPROBE_OPCODE_SIZE 4
+#define MAX_OPTINSN_SIZE (optprobe_template_end - optprobe_template_entry)
+
+#ifdef CONFIG_EARLY_KPROBES
+#define EARLY_KPROBES_CODES_AREA \
+ . = ALIGN(8); \
+ VMLINUX_SYMBOL(__early_kprobes_start) = .; \
+ VMLINUX_SYMBOL(__early_kprobes_code_area_start) = .; \
+ . = . + MAX_OPTINSN_SIZE * CONFIG_NR_EARLY_KPROBES_SLOTS; \
+ VMLINUX_SYMBOL(__early_kprobes_code_area_end) = .; \
+ . = ALIGN(8); \
+ VMLINUX_SYMBOL(__early_kprobes_insn_slot_start) = .; \
+ . = . + MAX_INSN_SIZE * KPROBE_OPCODE_SIZE * CONFIG_NR_EARLY_KPROBES_SLOTS;\
+ VMLINUX_SYMBOL(__early_kprobes_insn_slot_end) = .; \
+ VMLINUX_SYMBOL(__early_kprobes_end) = .;
+
+#else
+#define EARLY_KPROBES_CODES_AREA
+#endif
+
+#else
+
+#include <linux/ptrace.h>
+#include <linux/notifier.h>
+
#define flush_insn_slot(p) do { } while (0)
#define kretprobe_blacklist_size 0

typedef u32 kprobe_opcode_t;
+#define KPROBE_OPCODE_SIZE sizeof(kprobe_opcode_t)
struct kprobe;
#include <asm/probes.h>

@@ -83,4 +109,5 @@ struct arch_optimized_insn {
*/
};

+#endif /* __ASSEMBLY__ */
#endif /* _ARM_KPROBES_H */
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 9351f7f..6fa2b85 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -11,6 +11,7 @@
#ifdef CONFIG_ARM_KERNMEM_PERMS
#include <asm/pgtable.h>
#endif
+#include <asm/kprobes.h>

#define PROC_INFO \
. = ALIGN(4); \
@@ -108,6 +109,7 @@ SECTIONS
SCHED_TEXT
LOCK_TEXT
KPROBES_TEXT
+ EARLY_KPROBES_CODES_AREA
IDMAP_TEXT
#ifdef CONFIG_MMU
*(.fixup)
--
1.8.4

2015-02-13 05:52:50

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 15/26] early kprobes: x86: directly modify code.

When registering early kprobes, SMP should has not been enabled, so
doesn't require synchronization in text_poke_bp(). Simply memcpy is
enough.

Signed-off-by: Wang Nan <[email protected]>
---
arch/x86/kernel/kprobes/opt.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 0dd8d08..21847ab 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -36,6 +36,7 @@
#include <asm/alternative.h>
#include <asm/insn.h>
#include <asm/debugreg.h>
+#include <asm/tlbflush.h>

#include "common.h"

@@ -397,8 +398,15 @@ void arch_optimize_kprobes(struct list_head *oplist)
insn_buf[0] = RELATIVEJUMP_OPCODE;
*(s32 *)(&insn_buf[1]) = rel;

- text_poke_bp(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE,
- op->optinsn.insn);
+ if (unlikely(kprobes_is_early())) {
+ BUG_ON(!(op->kp.flags & KPROBE_FLAG_EARLY));
+ memcpy(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE);
+ local_flush_tlb();
+ sync_core();
+ } else {
+ text_poke_bp(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE,
+ op->optinsn.insn);
+ }

list_del_init(&op->list);
}
--
1.8.4

2015-02-13 05:54:22

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 16/26] early kprobes: x86: introduce early kprobes related code area.

This patch introduces EARLY_KPROBES_CODES_AREA into x86 vmlinux for
early kprobes.

Signed-off-by: Wang Nan <[email protected]>
---
arch/x86/include/asm/insn.h | 7 ++++---
arch/x86/include/asm/kprobes.h | 47 +++++++++++++++++++++++++++++++++++-------
arch/x86/kernel/vmlinux.lds.S | 2 ++
3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 47f29b1..ea6f318 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -20,6 +20,9 @@
* Copyright (C) IBM Corporation, 2009
*/

+#define MAX_INSN_SIZE 16
+
+#ifndef __ASSEMBLY__
/* insn_attr_t is defined in inat.h */
#include <asm/inat.h>

@@ -69,8 +72,6 @@ struct insn {
const insn_byte_t *next_byte;
};

-#define MAX_INSN_SIZE 16
-
#define X86_MODRM_MOD(modrm) (((modrm) & 0xc0) >> 6)
#define X86_MODRM_REG(modrm) (((modrm) & 0x38) >> 3)
#define X86_MODRM_RM(modrm) ((modrm) & 0x07)
@@ -197,5 +198,5 @@ static inline int insn_offset_immediate(struct insn *insn)
{
return insn_offset_displacement(insn) + insn->displacement.nbytes;
}
-
+#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_INSN_H */
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 4421b5d..6a6066a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -21,23 +21,54 @@
*
* See arch/x86/kernel/kprobes.c for x86 kprobes history.
*/
-#include <linux/types.h>
-#include <linux/ptrace.h>
-#include <linux/percpu.h>
-#include <asm/insn.h>

#define __ARCH_WANT_KPROBES_INSN_SLOT

-struct pt_regs;
-struct kprobe;
+#include <linux/types.h>
+#include <asm/insn.h>

-typedef u8 kprobe_opcode_t;
#define BREAKPOINT_INSTRUCTION 0xcc
#define RELATIVEJUMP_OPCODE 0xe9
#define RELATIVEJUMP_SIZE 5
#define RELATIVECALL_OPCODE 0xe8
#define RELATIVE_ADDR_SIZE 4
#define MAX_STACK_SIZE 64
+#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
+
+#ifdef __ASSEMBLY__
+
+#define KPROBE_OPCODE_SIZE 1
+#define MAX_OPTINSN_SIZE ((optprobe_template_end - optprobe_template_entry) + \
+ MAX_OPTIMIZED_LENGTH + RELATIVEJUMP_SIZE)
+
+#ifdef CONFIG_EARLY_KPROBES
+# define EARLY_KPROBES_CODES_AREA \
+ . = ALIGN(8); \
+ VMLINUX_SYMBOL(__early_kprobes_start) = .; \
+ VMLINUX_SYMBOL(__early_kprobes_code_area_start) = .; \
+ . = . + MAX_OPTINSN_SIZE * CONFIG_NR_EARLY_KPROBES_SLOTS; \
+ VMLINUX_SYMBOL(__early_kprobes_code_area_end) = .; \
+ . = ALIGN(8); \
+ VMLINUX_SYMBOL(__early_kprobes_insn_slot_start) = .; \
+ . = . + MAX_INSN_SIZE * KPROBE_OPCODE_SIZE * \
+ CONFIG_NR_EARLY_KPROBES_SLOTS; \
+ VMLINUX_SYMBOL(__early_kprobes_insn_slot_end) = .; \
+ VMLINUX_SYMBOL(__early_kprobes_end) = .;
+#else
+# define EARLY_KPROBES_CODES_AREA
+#endif
+
+#else
+
+#include <linux/ptrace.h>
+#include <linux/percpu.h>
+
+
+struct pt_regs;
+struct kprobe;
+
+typedef u8 kprobe_opcode_t;
+#define KPROBE_OPCODE_SIZE sizeof(kprobe_opcode_t)
#define MIN_STACK_SIZE(ADDR) \
(((MAX_STACK_SIZE) < (((unsigned long)current_thread_info()) + \
THREAD_SIZE - (unsigned long)(ADDR))) \
@@ -52,7 +83,6 @@ extern __visible kprobe_opcode_t optprobe_template_entry;
extern __visible kprobe_opcode_t optprobe_template_val;
extern __visible kprobe_opcode_t optprobe_template_call;
extern __visible kprobe_opcode_t optprobe_template_end;
-#define MAX_OPTIMIZED_LENGTH (MAX_INSN_SIZE + RELATIVE_ADDR_SIZE)
#define MAX_OPTINSN_SIZE \
(((unsigned long)&optprobe_template_end - \
(unsigned long)&optprobe_template_entry) + \
@@ -117,4 +147,5 @@ extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
extern int kprobe_int3_handler(struct pt_regs *regs);
extern int kprobe_debug_handler(struct pt_regs *regs);
+#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_KPROBES_H */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00bf300..69f3f0e 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -26,6 +26,7 @@
#include <asm/page_types.h>
#include <asm/cache.h>
#include <asm/boot.h>
+#include <asm/kprobes.h>

#undef i386 /* in case the preprocessor is a 32bit one */

@@ -100,6 +101,7 @@ SECTIONS
SCHED_TEXT
LOCK_TEXT
KPROBES_TEXT
+ EARLY_KPROBES_CODES_AREA
ENTRY_TEXT
IRQENTRY_TEXT
*(.fixup)
--
1.8.4

2015-02-13 05:54:52

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 17/26] early kprobes: introduces macros for allocing early kprobe resources.

Introduces macros to genearte common early kprobe related resource
allocator.

All early kprobe related resources are statically allocated during
linking for each early kprobe slot. For each type of resource, a bitmap
is used to track allocation. __DEFINE_EKPROBE_ALLOC_OPS defines alloc
and free handler for them. The range of the resource and the bitmap
should be provided for allocaing and freeing. DEFINE_EKPROBE_ALLOC_OPS
defines bitmap and the array used by it.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/kprobes.h | 78 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 8d2e754..cd7a2a5 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -270,6 +270,84 @@ extern void show_registers(struct pt_regs *regs);
extern void kprobes_inc_nmissed_count(struct kprobe *p);
extern bool arch_within_kprobe_blacklist(unsigned long addr);

+#ifdef CONFIG_EARLY_KPROBES
+
+#define NR_EARLY_KPROBES_SLOTS CONFIG_NR_EARLY_KPROBES_SLOTS
+#define ALIGN_UP(v, a) (((v) + ((a) - 1)) & ~((a) - 1))
+#define EARLY_KPROBES_BITMAP_SZ ALIGN_UP(NR_EARLY_KPROBES_SLOTS, BITS_PER_LONG)
+
+#define __ek_in_range(v, s, e) (((v) >= (s)) && ((v) < (e)))
+#define __ek_buf_sz(s, e) ((void *)(e) - (void *)(s))
+#define __ek_elem_sz_b(s, e) (__ek_buf_sz(s, e) / NR_EARLY_KPROBES_SLOTS)
+#define __ek_elem_sz(s, e) (__ek_elem_sz_b(s, e) / sizeof(s[0]))
+#define __ek_elem_idx(v, s, e) (__ek_buf_sz(s, v) / __ek_elem_sz_b(s, e))
+#define __ek_get_elem(i, s, e) (&((s)[__ek_elem_sz(s, e) * (i)]))
+#define __DEFINE_EKPROBE_ALLOC_OPS(__t, __name) \
+static inline __t *__ek_alloc_##__name(__t *__s, __t *__e, unsigned long *__b)\
+{ \
+ int __i = find_next_zero_bit(__b, NR_EARLY_KPROBES_SLOTS, 0); \
+ if (__i >= NR_EARLY_KPROBES_SLOTS) \
+ return NULL; \
+ set_bit(__i, __b); \
+ return __ek_get_elem(__i, __s, __e); \
+} \
+static inline int __ek_free_##__name(__t *__v, __t *__s, __t *__e, unsigned long *__b) \
+{ \
+ if (!__ek_in_range(__v, __s, __e)) \
+ return 0; \
+ clear_bit(__ek_elem_idx(__v, __s, __e), __b); \
+ return 1; \
+}
+
+#define __DEFINE_EKPROBE_AREA(__t, __name, __static) \
+__static __t __ek_##__name##_slots[NR_EARLY_KPROBES_SLOTS]; \
+__static unsigned long __ek_##__name##_bitmap[EARLY_KPROBES_BITMAP_SZ];
+
+#define DEFINE_EKPROBE_ALLOC_OPS(__t, __name, __static) \
+__DEFINE_EKPROBE_AREA(__t, __name, __static) \
+__DEFINE_EKPROBE_ALLOC_OPS(__t, __name) \
+static inline __t *ek_alloc_##__name(void) \
+{ \
+ return __ek_alloc_##__name(&((__ek_##__name##_slots)[0]), \
+ &((__ek_##__name##_slots)[NR_EARLY_KPROBES_SLOTS]),\
+ (__ek_##__name##_bitmap)); \
+} \
+static inline int ek_free_##__name(__t *__s) \
+{ \
+ return __ek_free_##__name(__s, &((__ek_##__name##_slots)[0]), \
+ &((__ek_##__name##_slots)[NR_EARLY_KPROBES_SLOTS]),\
+ (__ek_##__name##_bitmap)); \
+}
+
+
+#else
+#define __DEFINE_EKPROBE_ALLOC_OPS(__t, __name) \
+static inline __t *__ek_alloc_##__name(__t *__s, __t *__e, unsigned long *__b)\
+{ \
+ return NULL; \
+} \
+static inline int __ek_free_##__name(__t *__v, __t *__s, __t *__e, unsigned long *__b)\
+{ \
+ return 0; \
+}
+
+#define __DEFINE_EKPROBE_AREA(__t, __name, __static) \
+__static __t __ek_##__name##_slots[0]; \
+__static unsigned long __ek_##__name##_bitmap[0];
+
+#define DEFINE_EKPROBE_ALLOC_OPS(__t, __name, __static) \
+__DEFINE_EKPROBE_ALLOC_OPS(__t, __name) \
+static inline __t *ek_alloc_##__name(void) \
+{ \
+ return NULL; \
+} \
+static inline int ek_free_##__name(__t *__s) \
+{ \
+ return 0; \
+}
+
+#endif
+
struct kprobe_insn_cache {
struct mutex mutex;
void *(*alloc)(void); /* allocate insn page */
--
1.8.4

2015-02-13 05:54:23

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 18/26] early kprobes: allows __alloc_insn_slot() from early kprobes slots.

Introduces early_slots_start/end and bitmap for struct kprobe_insn_cache
then uses previous introduced macro to generate allocator. This patch
makes get/free_insn_slot() and get/free_optinsn_slot() transparent to
early kprobes.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/kprobes.h | 40 ++++++++++++++++++++++++++++++++++++++++
kernel/kprobes.c | 14 ++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index cd7a2a5..6100678 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -319,6 +319,17 @@ static inline int ek_free_##__name(__t *__s) \
(__ek_##__name##_bitmap)); \
}

+/*
+ * Start and end of early kprobes area, including code area and
+ * insn_slot area.
+ */
+extern char __early_kprobes_start[];
+extern char __early_kprobes_end[];
+
+extern kprobe_opcode_t __early_kprobes_code_area_start[];
+extern kprobe_opcode_t __early_kprobes_code_area_end[];
+extern kprobe_opcode_t __early_kprobes_insn_slot_start[];
+extern kprobe_opcode_t __early_kprobes_insn_slot_end[];

#else
#define __DEFINE_EKPROBE_ALLOC_OPS(__t, __name) \
@@ -348,6 +359,8 @@ static inline int ek_free_##__name(__t *__s) \

#endif

+__DEFINE_EKPROBE_ALLOC_OPS(kprobe_opcode_t, opcode)
+
struct kprobe_insn_cache {
struct mutex mutex;
void *(*alloc)(void); /* allocate insn page */
@@ -355,8 +368,35 @@ struct kprobe_insn_cache {
struct list_head pages; /* list of kprobe_insn_page */
size_t insn_size; /* size of instruction slot */
int nr_garbage;
+#ifdef CONFIG_EARLY_KPROBES
+# define slots_start(c) ((c)->early_slots_start)
+# define slots_end(c) ((c)->early_slots_end)
+# define slots_bitmap(c) ((c)->early_slots_bitmap)
+ kprobe_opcode_t *early_slots_start;
+ kprobe_opcode_t *early_slots_end;
+ unsigned long early_slots_bitmap[EARLY_KPROBES_BITMAP_SZ];
+#else
+# define slots_start(c) NULL
+# define slots_end(c) NULL
+# define slots_bitmap(c) NULL
+#endif
};

+static inline kprobe_opcode_t *
+__get_insn_slot_early(struct kprobe_insn_cache *c)
+{
+ return __ek_alloc_opcode(slots_start(c),
+ slots_end(c), slots_bitmap(c));
+}
+
+static inline int
+__free_insn_slot_early(struct kprobe_insn_cache *c,
+ kprobe_opcode_t *slot)
+{
+ return __ek_free_opcode(slot, slots_start(c),
+ slots_end(c), slots_bitmap(c));
+}
+
extern kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c);
extern void __free_insn_slot(struct kprobe_insn_cache *c,
kprobe_opcode_t *slot, int dirty);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 647c95a..fa1e422 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -143,6 +143,10 @@ struct kprobe_insn_cache kprobe_insn_slots = {
.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
.insn_size = MAX_INSN_SIZE,
.nr_garbage = 0,
+#ifdef CONFIG_EARLY_KPROBES
+ .early_slots_start = __early_kprobes_insn_slot_start,
+ .early_slots_end = __early_kprobes_insn_slot_end,
+#endif
};
static int collect_garbage_slots(struct kprobe_insn_cache *c);

@@ -155,6 +159,9 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
struct kprobe_insn_page *kip;
kprobe_opcode_t *slot = NULL;

+ if (kprobes_is_early())
+ return __get_insn_slot_early(c);
+
mutex_lock(&c->mutex);
retry:
list_for_each_entry(kip, &c->pages, list) {
@@ -255,6 +262,9 @@ void __free_insn_slot(struct kprobe_insn_cache *c,
{
struct kprobe_insn_page *kip;

+ if (unlikely(__free_insn_slot_early(c, slot)))
+ return;
+
mutex_lock(&c->mutex);
list_for_each_entry(kip, &c->pages, list) {
long idx = ((long)slot - (long)kip->insns) /
@@ -286,6 +296,10 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
/* .insn_size is initialized later */
.nr_garbage = 0,
+#ifdef CONFIG_EARLY_KPROBES
+ .early_slots_start = __early_kprobes_code_area_start,
+ .early_slots_end = __early_kprobes_code_area_end,
+#endif
};
#endif
#endif
--
1.8.4

2015-02-13 05:53:59

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 19/26] early kprobes: perhibit probing at early kprobe reserved area.

Puts early kprobe reserved area into kprobe blacklist.

Signed-off-by: Wang Nan <[email protected]>
---
kernel/kprobes.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index fa1e422..b83c406 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1358,6 +1358,13 @@ static bool within_kprobe_blacklist(unsigned long addr)

if (arch_within_kprobe_blacklist(addr))
return true;
+
+#ifdef CONFIG_EARLY_KPROBES
+ if (addr >= (unsigned long)__early_kprobes_start &&
+ addr < (unsigned long)__early_kprobes_end)
+ return true;
+#endif
+
/*
* If there exists a kprobe_blacklist, verify and
* fail any probe registration in the prohibited area
--
1.8.4

2015-02-13 05:52:58

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 20/26] early kprobes: core logic of eraly kprobes.

This patch is the main logic of early kprobe.

If register_kprobe() is called before kprobes_initialized, an early
kprobe is allocated. Try to utilize existing OPTPROBE mechanism to
replace the target instruction by a branch instead of breakpoint,
because interrupt handlers may not been initialized yet.

All resources required by early kprobes are allocated statically.
CONFIG_NR_EARLY_KPROBES_SLOTS is used to control number of possible
early kprobes.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/kprobes.h | 4 ++
kernel/kprobes.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 148 insertions(+), 6 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 6100678..0c64df8 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -450,6 +450,10 @@ extern int proc_kprobes_optimization_handler(struct ctl_table *table,
size_t *length, loff_t *ppos);
#endif

+struct early_kprobe_slot {
+ struct optimized_kprobe op;
+};
+
#endif /* CONFIG_OPTPROBES */
#ifdef CONFIG_KPROBES_ON_FTRACE
extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index b83c406..131a71a 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -77,6 +77,10 @@ int kprobes_is_early(void)
static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];

+#ifdef CONFIG_EARLY_KPROBES
+static HLIST_HEAD(early_kprobe_hlist);
+#endif
+
/* NOTE: change this value only with kprobe_mutex held */
static bool kprobes_all_disarmed;

@@ -87,6 +91,8 @@ static struct {
raw_spinlock_t lock ____cacheline_aligned_in_smp;
} kretprobe_table_locks[KPROBE_TABLE_SIZE];

+DEFINE_EKPROBE_ALLOC_OPS(struct early_kprobe_slot, early_kprobe, static)
+
static raw_spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
{
return &(kretprobe_table_locks[hash].lock);
@@ -326,7 +332,12 @@ struct kprobe *get_kprobe(void *addr)
struct hlist_head *head;
struct kprobe *p;

- head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
+#ifdef CONFIG_EARLY_KPROBES
+ if (kprobes_is_early())
+ head = &early_kprobe_hlist;
+ else
+#endif
+ head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
hlist_for_each_entry_rcu(p, head, hlist) {
if (p->addr == addr)
return p;
@@ -386,11 +397,14 @@ NOKPROBE_SYMBOL(opt_pre_handler);
static void free_aggr_kprobe(struct kprobe *p)
{
struct optimized_kprobe *op;
+ struct early_kprobe_slot *ep;

op = container_of(p, struct optimized_kprobe, kp);
arch_remove_optimized_kprobe(op);
arch_remove_kprobe(p);
- kfree(op);
+ ep = container_of(op, struct early_kprobe_slot, op);
+ if (likely(!ek_free_early_kprobe(ep)))
+ kfree(op);
}

/* Return true(!0) if the kprobe is ready for optimization. */
@@ -607,9 +621,15 @@ static void optimize_kprobe(struct kprobe *p)
struct optimized_kprobe *op;

/* Check if the kprobe is disabled or not ready for optimization. */
- if (!kprobe_optready(p) || !kprobes_allow_optimization ||
- (kprobe_disabled(p) || kprobes_all_disarmed))
- return;
+ if (unlikely(kprobes_is_early())) {
+ BUG_ON(!(p->flags & KPROBE_FLAG_EARLY));
+ if (!kprobe_optready(p) || kprobe_disabled(p))
+ return;
+ } else {
+ if (!kprobe_optready(p) || !kprobes_allow_optimization ||
+ (kprobe_disabled(p) || kprobes_all_disarmed))
+ return;
+ }

/* Both of break_handler and post_handler are not supported. */
if (p->break_handler || p->post_handler)
@@ -631,7 +651,10 @@ static void optimize_kprobe(struct kprobe *p)
list_del_init(&op->list);
else {
list_add(&op->list, &optimizing_list);
- kick_kprobe_optimizer();
+ if (kprobes_is_early())
+ arch_optimize_kprobes(&optimizing_list);
+ else
+ kick_kprobe_optimizer();
}
}

@@ -1505,6 +1528,8 @@ out:
return ret;
}

+static int register_early_kprobe(struct kprobe *p);
+
int register_kprobe(struct kprobe *p)
{
int ret;
@@ -1518,6 +1543,14 @@ int register_kprobe(struct kprobe *p)
return PTR_ERR(addr);
p->addr = addr;

+ if (unlikely(kprobes_is_early())) {
+ p->flags |= KPROBE_FLAG_EARLY;
+ return register_early_kprobe(p);
+ }
+
+ WARN(p->flags & KPROBE_FLAG_EARLY,
+ "register early kprobe after kprobes initialized\n");
+
ret = check_kprobe_rereg(p);
if (ret)
return ret;
@@ -2156,6 +2189,8 @@ static struct notifier_block kprobe_module_nb = {
extern unsigned long __start_kprobe_blacklist[];
extern unsigned long __stop_kprobe_blacklist[];

+static void convert_early_kprobes(void);
+
static int __init init_kprobes(void)
{
int i, err = 0;
@@ -2204,6 +2239,7 @@ static int __init init_kprobes(void)
if (!err)
err = register_module_notifier(&kprobe_module_nb);

+ convert_early_kprobes();
kprobes_initialized = (err == 0);

if (!err)
@@ -2497,3 +2533,105 @@ module_init(init_kprobes);

/* defined in arch/.../kernel/kprobes.c */
EXPORT_SYMBOL_GPL(jprobe_return);
+
+#ifdef CONFIG_EARLY_KPROBES
+
+static int register_early_kprobe(struct kprobe *p)
+{
+ struct early_kprobe_slot *slot;
+ int err;
+
+ if (p->break_handler || p->post_handler)
+ return -EINVAL;
+ if (p->flags & KPROBE_FLAG_DISABLED)
+ return -EINVAL;
+
+ slot = ek_alloc_early_kprobe();
+ if (!slot) {
+ pr_err("No enough early kprobe slots.\n");
+ return -ENOMEM;
+ }
+
+ p->flags &= KPROBE_FLAG_DISABLED;
+ p->flags |= KPROBE_FLAG_EARLY;
+ p->nmissed = 0;
+
+ err = arch_prepare_kprobe(p);
+ if (err) {
+ pr_err("arch_prepare_kprobe failed\n");
+ goto free_slot;
+ }
+
+ INIT_LIST_HEAD(&p->list);
+ INIT_HLIST_NODE(&p->hlist);
+ INIT_LIST_HEAD(&slot->op.list);
+ slot->op.kp.addr = p->addr;
+ slot->op.kp.flags = p->flags | KPROBE_FLAG_EARLY;
+
+ err = arch_prepare_optimized_kprobe(&slot->op, p);
+ if (err) {
+ pr_err("Failed to prepare optimized kprobe.\n");
+ goto remove_optimized;
+ }
+
+ if (!arch_prepared_optinsn(&slot->op.optinsn)) {
+ pr_err("Failed to prepare optinsn.\n");
+ err = -ENOMEM;
+ goto remove_optimized;
+ }
+
+ hlist_add_head_rcu(&p->hlist, &early_kprobe_hlist);
+ init_aggr_kprobe(&slot->op.kp, p);
+ optimize_kprobe(&slot->op.kp);
+ return 0;
+
+remove_optimized:
+ arch_remove_optimized_kprobe(&slot->op);
+free_slot:
+ ek_free_early_kprobe(slot);
+ return err;
+}
+
+static void
+convert_early_kprobe(struct kprobe *kp)
+{
+ struct module *probed_mod;
+ int err;
+
+ BUG_ON(!kprobe_aggrprobe(kp));
+
+ err = check_kprobe_address_safe(kp, &probed_mod);
+ if (err)
+ panic("Insert kprobe at %p is not safe!", kp->addr);
+
+ /*
+ * FIXME:
+ * convert kprobe to ftrace if CONFIG_KPROBES_ON_FTRACE is on
+ * and kp is on ftrace location.
+ */
+
+ mutex_lock(&kprobe_mutex);
+ hlist_del_rcu(&kp->hlist);
+
+ INIT_HLIST_NODE(&kp->hlist);
+ hlist_add_head_rcu(&kp->hlist,
+ &kprobe_table[hash_ptr(kp->addr, KPROBE_HASH_BITS)]);
+ mutex_unlock(&kprobe_mutex);
+
+ if (probed_mod)
+ module_put(probed_mod);
+}
+
+static void
+convert_early_kprobes(void)
+{
+ struct kprobe *p;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(p, tmp, &early_kprobe_hlist, hlist)
+ convert_early_kprobe(p);
+};
+#else
+static int register_early_kprobe(struct kprobe *p) { return -ENOSYS; }
+static void convert_early_kprobes(void) {};
+#endif
--
1.8.4

2015-02-13 05:52:52

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 21/26] early kprobes: add CONFIG_EARLY_KPROBES option.

Enable early kprobes in Kconfig.

Signed-off-by: Wang Nan <[email protected]>
---
arch/Kconfig | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a..32e9f4a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -46,6 +46,21 @@ config KPROBES
for kernel debugging, non-intrusive instrumentation and testing.
If in doubt, say "N".

+config EARLY_KPROBES
+ bool "Enable kprobes at very early booting stage"
+ depends on KPROBES && OPTPROBES
+ def_bool y
+ help
+ Enable kprobe at very early booting stage.
+
+config NR_EARLY_KPROBES_SLOTS
+ int "Number of possible early kprobes"
+ range 1 64
+ default 16
+ depends on EARLY_KPROBES
+ help
+ Number of early kprobes slots.
+
config JUMP_LABEL
bool "Optimize very unlikely/likely branches"
depends on HAVE_ARCH_JUMP_LABEL
--
1.8.4

2015-02-13 05:53:18

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 22/26] early kprobes: introduce arch_fix_ftrace_early_kprobe().

This patch is for futher use. arch_fix_ftrace_early_kprobe() will be
called when ftrace trying to convert ftrace entries to nop and fail. For
x86 it should adjust the saved nop instruction here because it doesn't
know what nop ftrace will choose when early probing.

Signed-off-by: Wang Nan <[email protected]>
---
arch/x86/kernel/kprobes/opt.c | 31 +++++++++++++++++++++++++++++++
include/linux/kprobes.h | 5 +++++
kernel/kprobes.c | 6 ++++++
3 files changed, 42 insertions(+)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 21847ab..f3ea954 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -456,3 +456,34 @@ int setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter)
return 0;
}
NOKPROBE_SYMBOL(setup_detour_execution);
+
+#ifdef CONFIG_EARLY_KPROBES
+void arch_fix_ftrace_early_kprobe(struct optimized_kprobe *op)
+{
+ const unsigned char *correct_nop5 = ideal_nops[NOP_ATOMIC5];
+ struct kprobe *list_p;
+
+ u32 mask = KPROBE_FLAG_EARLY |
+ KPROBE_FLAG_OPTIMIZED |
+ KPROBE_FLAG_FTRACE;
+
+ if ((op->kp.flags & mask) != mask)
+ return;
+
+ /*
+ * For early kprobe on ftrace, use right nop instruction.
+ * See x86 ftrace_make_nop and ftrace_nop_replace. Note that
+ * ideal_nops used by ftrace_nop_replace is setupt after early
+ * kprobe registration.
+ */
+
+ memcpy(&op->kp.opcode, correct_nop5, sizeof(kprobe_opcode_t));
+ memcpy(op->optinsn.copied_insn, correct_nop5 + INT3_SIZE,
+ RELATIVE_ADDR_SIZE);
+
+ /* Fix all kprobes connected to it */
+ list_for_each_entry_rcu(list_p, &op->kp.list, list)
+ memcpy(&list_p->opcode, correct_nop5, sizeof(kprobe_opcode_t));
+
+}
+#endif
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 0c64df8..e483f1b 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -459,6 +459,11 @@ struct early_kprobe_slot {
extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct pt_regs *regs);
extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
+
+#ifdef CONFIG_EARLY_KPROBES
+extern void arch_fix_ftrace_early_kprobe(struct optimized_kprobe *p);
+#endif
+
#endif

int arch_check_ftrace_location(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 131a71a..0bbb510 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2536,6 +2536,12 @@ EXPORT_SYMBOL_GPL(jprobe_return);

#ifdef CONFIG_EARLY_KPROBES

+#ifdef CONFIG_KPROBES_ON_FTRACE
+void __weak arch_fix_ftrace_early_kprobe(struct optimized_kprobe *p)
+{
+}
+#endif
+
static int register_early_kprobe(struct kprobe *p)
{
struct early_kprobe_slot *slot;
--
1.8.4

2015-02-13 05:49:37

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 23/26] early kprobes: x86: arch_restore_optimized_kprobe().

arch_restore_optimized_kprobe() can be used to temporarily restore
probed instruction. It will actually disable optimized kprobe, but keep
the relatived data structure. It uses stop_machine() to enforce
atimicity.

Signed-off-by: Wang Nan <[email protected]>
---
arch/x86/kernel/kprobes/opt.c | 26 ++++++++++++++++++++++++++
include/linux/kprobes.h | 1 +
2 files changed, 27 insertions(+)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index f3ea954..12332c2 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -28,6 +28,7 @@
#include <linux/kdebug.h>
#include <linux/kallsyms.h>
#include <linux/ftrace.h>
+#include <linux/stop_machine.h>

#include <asm/cacheflush.h>
#include <asm/desc.h>
@@ -486,4 +487,29 @@ void arch_fix_ftrace_early_kprobe(struct optimized_kprobe *op)
memcpy(&list_p->opcode, correct_nop5, sizeof(kprobe_opcode_t));

}
+
+static int do_restore_kprobe(void *p)
+{
+ struct optimized_kprobe *op = p;
+ u8 insn_buf[RELATIVEJUMP_SIZE];
+
+ memcpy(insn_buf, &op->kp.opcode, sizeof(kprobe_opcode_t));
+ memcpy(insn_buf + INT3_SIZE,
+ op->optinsn.copied_insn,
+ RELATIVE_ADDR_SIZE);
+ text_poke(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE);
+ return 0;
+}
+
+void arch_restore_optimized_kprobe(struct optimized_kprobe *op)
+{
+ u32 mask = KPROBE_FLAG_EARLY |
+ KPROBE_FLAG_OPTIMIZED |
+ KPROBE_FLAG_FTRACE;
+
+ if ((op->kp.flags & mask) != mask)
+ return;
+
+ stop_machine(do_restore_kprobe, op, NULL);
+}
#endif
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e483f1b..e615402 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -462,6 +462,7 @@ extern int arch_prepare_kprobe_ftrace(struct kprobe *p);

#ifdef CONFIG_EARLY_KPROBES
extern void arch_fix_ftrace_early_kprobe(struct optimized_kprobe *p);
+extern void arch_restore_optimized_kprobe(struct optimized_kprobe *p);
#endif

#endif
--
1.8.4

2015-02-13 05:56:11

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 24/26] early kprobes: core logic to support early kprobe on ftrace.

This is the main patch to support early kprobes on ftrace.

Utilizes previous introduced ftrace update notification chain to fix
possible ftrace code modifition failuer.

For early kprobes on ftrace, register ftrace_notifier_call() to ftrace
update notifier to receive ftrace code conversion failures.

When registering early kprobes, uses check_kprobe_address_safe() to
check whether it is an ftrace entries and uses
ftrace_process_loc_early() to convert such instruction to nop before
ftrace inited. Previous ftrace patches make such checking and
modification possible.

When ftrace doing the NOP conversion, give x86 a chance to adjust probed
nop instruction by calling arch_fix_ftrace_early_kprobe().

When ftrace trying to enable the probed ftrace entry, restores the NOP
instruction. There are 2 different situations. Case 1: ftrace is
enabled by ftrace_filter= option. In this case the early kprobe will
stop work until kprobe fully initialized. Case 2: registering ftrace
events during converting early kprobe to normal kprobe. Event losing is
possible, but in case 2 the window should be small enough.

After kprobe fully initialized, converts early kprobes on ftrace to
normal kprobe on ftrace by first restoring ftrace then register ftrace
event on them. Conversion is splitted into two parts. The first part
does some checking and converting kprobes on ftrace. The second part is
wrapped by stop_machine() to avoid losting events during list
manipulation. kprobes_initialized is also set in stop_machine() context
to avoid event losing.

Signed-off-by: Wang Nan <[email protected]>
---
include/linux/kprobes.h | 1 +
kernel/kprobes.c | 247 +++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 225 insertions(+), 23 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index e615402..8f4d344 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -131,6 +131,7 @@ struct kprobe {
*/
#define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */
#define KPROBE_FLAG_EARLY 16 /* early kprobe */
+#define KPROBE_FLAG_RESTORED 32 /* temporarily restored to its original insn */

/* Has this kprobe gone ? */
static inline int kprobe_gone(struct kprobe *p)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0bbb510..edac74b 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -48,6 +48,7 @@
#include <linux/ftrace.h>
#include <linux/cpu.h>
#include <linux/jump_label.h>
+#include <linux/stop_machine.h>

#include <asm-generic/sections.h>
#include <asm/cacheflush.h>
@@ -2239,11 +2240,24 @@ static int __init init_kprobes(void)
if (!err)
err = register_module_notifier(&kprobe_module_nb);

- convert_early_kprobes();
- kprobes_initialized = (err == 0);
-
- if (!err)
+ if (!err) {
+ /*
+ * Let convert_early_kprobes setup kprobes_initialized
+ * to 1 in stop_machine() context. If not, we may lost
+ * events from kprobe on ftrace happens in the gap.
+ *
+ * kprobe_ftrace_handler() use get_kprobe() to retrive
+ * kprobe being triggered, which depends on
+ * kprobes_is_early() to determine hlist used for
+ * searching. convert_early_kprobes() relike early
+ * kprobes to normal hlist. If event raises after that
+ * before setting kprobes_initialized, get_kprobe()
+ * will retrive incorrect list.
+ */
+ convert_early_kprobes();
init_test_probes();
+ }
+
return err;
}

@@ -2540,11 +2554,127 @@ EXPORT_SYMBOL_GPL(jprobe_return);
void __weak arch_fix_ftrace_early_kprobe(struct optimized_kprobe *p)
{
}
+
+static int restore_optimized_kprobe(struct optimized_kprobe *op)
+{
+ /* If it already restored, pass it to other. */
+ if (op->kp.flags & KPROBE_FLAG_RESTORED)
+ return NOTIFY_DONE;
+
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+ arch_restore_optimized_kprobe(op);
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+
+ op->kp.flags |= KPROBE_FLAG_RESTORED;
+ return NOTIFY_STOP;
+}
+
+static int ftrace_notifier_call(struct notifier_block *nb,
+ unsigned long val, void *param)
+{
+ struct ftrace_update_notifier_info *info = param;
+ struct optimized_kprobe *op;
+ struct dyn_ftrace *rec;
+ struct kprobe *kp;
+ int enable;
+ void *addr;
+ int ret = NOTIFY_DONE;
+
+ if (!info || !info->rec || !info->rec->ip)
+ return NOTIFY_DONE;
+
+ rec = info->rec;
+ enable = info->enable;
+ addr = (void *)rec->ip;
+
+ mutex_lock(&kprobe_mutex);
+ kp = get_kprobe(addr);
+ mutex_unlock(&kprobe_mutex);
+
+ if (!kp || !kprobe_aggrprobe(kp))
+ return NOTIFY_DONE;
+
+ op = container_of(kp, struct optimized_kprobe, kp);
+ /*
+ * Ftrace is trying to convert ftrace entries to nop
+ * instruction. This conversion should have already been done
+ * at register_early_kprobe(). x86 needs fixing here.
+ */
+ if (!(rec->flags & FTRACE_FL_ENABLED) && (!enable)) {
+ arch_fix_ftrace_early_kprobe(op);
+ return NOTIFY_STOP;
+ }
+
+ /*
+ * Ftrace is trying to enable a trace entry. We temporary
+ * restore the probed instruction.
+ * We can continue using this kprobe as a ftrace-based kprobe,
+ * but event between this restoring and early kprobe conversion
+ * will get lost.
+ */
+ if (!(rec->flags & FTRACE_FL_ENABLED) && enable) {
+ ret = restore_optimized_kprobe(op);
+
+ /* Let ftrace retry if restore is successful. */
+ if (ret == NOTIFY_STOP)
+ info->retry = true;
+ return ret;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block ftrace_notifier_block = {
+ .notifier_call = ftrace_notifier_call,
+};
+static bool ftrace_notifier_registred = false;
+
+static int enable_early_kprobe_on_ftrace(struct kprobe *p)
+{
+ int err;
+
+ if (!ftrace_notifier_registred) {
+ err = register_ftrace_update_notifier(&ftrace_notifier_block);
+ if (err) {
+ pr_err("Failed to register ftrace update notifier\n");
+ return err;
+ }
+ ftrace_notifier_registred = true;
+ }
+
+ err = ftrace_process_loc_early((unsigned long)p->addr);
+ if (err)
+ pr_err("Failed to process ftrace entry at %p\n", p->addr);
+ return err;
+}
+
+/* Caller must ensure kprobe_aggrprobe(kp). */
+static void convert_early_ftrace_kprobe_top(struct optimized_kprobe *op)
+{
+ restore_optimized_kprobe(op);
+ arm_kprobe_ftrace(&op->kp);
+}
+
+#else
+static inline int enable_early_kprobe_on_ftrace(struct kprobe *__unused)
+{ return 0; }
+
+/*
+ * If CONFIG_KPROBES_ON_FTRACE is off this function should never get called,
+ * so let it trigger a warning.
+ */
+static inline void convert_early_ftrace_kprobe_top(struct optimized_kprobe *__unused)
+{
+ WARN_ON(1);
+}
#endif

static int register_early_kprobe(struct kprobe *p)
{
struct early_kprobe_slot *slot;
+ struct module *probed_mod;
int err;

if (p->break_handler || p->post_handler)
@@ -2552,13 +2682,25 @@ static int register_early_kprobe(struct kprobe *p)
if (p->flags & KPROBE_FLAG_DISABLED)
return -EINVAL;

+ err = check_kprobe_address_safe(p, &probed_mod);
+ if (err)
+ return err;
+
+ BUG_ON(probed_mod);
+
+ if (kprobe_ftrace(p)) {
+ err = enable_early_kprobe_on_ftrace(p);
+ if (err)
+ return err;
+ }
+
slot = ek_alloc_early_kprobe();
if (!slot) {
pr_err("No enough early kprobe slots.\n");
return -ENOMEM;
}

- p->flags &= KPROBE_FLAG_DISABLED;
+ p->flags &= KPROBE_FLAG_DISABLED | KPROBE_FLAG_FTRACE;
p->flags |= KPROBE_FLAG_EARLY;
p->nmissed = 0;

@@ -2599,45 +2741,104 @@ free_slot:
}

static void
-convert_early_kprobe(struct kprobe *kp)
+convert_early_kprobe_top(struct kprobe *kp)
{
struct module *probed_mod;
+ struct optimized_kprobe *op;
int err;

BUG_ON(!kprobe_aggrprobe(kp));
+ op = container_of(kp, struct optimized_kprobe, kp);

err = check_kprobe_address_safe(kp, &probed_mod);
if (err)
panic("Insert kprobe at %p is not safe!", kp->addr);
+ BUG_ON(probed_mod);

- /*
- * FIXME:
- * convert kprobe to ftrace if CONFIG_KPROBES_ON_FTRACE is on
- * and kp is on ftrace location.
- */
+ if (kprobe_ftrace(kp))
+ convert_early_ftrace_kprobe_top(op);
+}

- mutex_lock(&kprobe_mutex);
- hlist_del_rcu(&kp->hlist);
+static void
+convert_early_kprobes_top(void)
+{
+ struct kprobe *p;
+
+ hlist_for_each_entry(p, &early_kprobe_hlist, hlist)
+ convert_early_kprobe_top(p);
+}
+
+static LIST_HEAD(early_freeing_list);
+
+static void
+convert_early_kprobe_stop_machine(struct kprobe *kp)
+{
+ struct optimized_kprobe *op;

+ BUG_ON(!kprobe_aggrprobe(kp));
+ op = container_of(kp, struct optimized_kprobe, kp);
+
+ if ((kprobe_ftrace(kp)) && (list_is_singular(&op->kp.list))) {
+ /* Update kp */
+ kp = list_entry(op->kp.list.next, struct kprobe, list);
+
+ hlist_replace_rcu(&op->kp.hlist, &kp->hlist);
+ list_del_init(&kp->list);
+
+ op->kp.flags |= KPROBE_FLAG_DISABLED;
+ list_add(&op->list, &early_freeing_list);
+ }
+
+ hlist_del_rcu(&kp->hlist);
INIT_HLIST_NODE(&kp->hlist);
hlist_add_head_rcu(&kp->hlist,
- &kprobe_table[hash_ptr(kp->addr, KPROBE_HASH_BITS)]);
- mutex_unlock(&kprobe_mutex);
-
- if (probed_mod)
- module_put(probed_mod);
+ &kprobe_table[hash_ptr(kp->addr, KPROBE_HASH_BITS)]);
}

-static void
-convert_early_kprobes(void)
+static int
+convert_early_kprobes_stop_machine(void *__unused)
{
struct kprobe *p;
struct hlist_node *tmp;

hlist_for_each_entry_safe(p, tmp, &early_kprobe_hlist, hlist)
- convert_early_kprobe(p);
+ convert_early_kprobe_stop_machine(p);
+
+ /*
+ * See comment in init_kprobes(). We must set
+ * kprobes_initialized in stop_machine() context.
+ */
+ kprobes_initialized = 1;
+ return 0;
+}
+
+static void
+convert_early_kprobes(void)
+{
+ struct optimized_kprobe *op, *tmp;
+
+ mutex_lock(&kprobe_mutex);
+
+ convert_early_kprobes_top();
+
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+
+ stop_machine(convert_early_kprobes_stop_machine, NULL, NULL);
+
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+ mutex_unlock(&kprobe_mutex);
+
+ list_for_each_entry_safe(op, tmp, &early_freeing_list, list) {
+ list_del_init(&op->list);
+ free_aggr_kprobe(&op->kp);
+ }
};
#else
-static int register_early_kprobe(struct kprobe *p) { return -ENOSYS; }
-static void convert_early_kprobes(void) {};
+static inline int register_early_kprobe(struct kprobe *p) { return -ENOSYS; }
+static inline void convert_early_kprobes(void)
+{
+ kprobes_initialized = 1;
+}
#endif
--
1.8.4

2015-02-13 05:49:51

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 25/26] early kprobes: introduce kconfig option to support early kprobe on ftrace.

On platform (like x86) which supports CONFIG_KPROBE_ON_FTRACE, makes
early kprobe depend on it so we are able to probe function entries.

Signed-off-by: Wang Nan <[email protected]>
---
arch/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 32e9f4a..79f809d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -48,7 +48,7 @@ config KPROBES

config EARLY_KPROBES
bool "Enable kprobes at very early booting stage"
- depends on KPROBES && OPTPROBES
+ depends on KPROBES && OPTPROBES && (KPROBES_ON_FTRACE || !HAVE_KPROBES_ON_FTRACE)
def_bool y
help
Enable kprobe at very early booting stage.
--
1.8.4

2015-02-13 05:53:41

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 26/26] kprobes: enable 'ekprobe=' cmdline option for early kprobes.

This patch shows a very rough usage of arly kprobes. By adding
kernel cmdline options such as 'ekprobe=__alloc_pages_nodemask' or
'ekprobe=0xc00f3c2c', early kprobes are installed. When the probed
instructions get hit, a message is printed.

This patch is only a sample. I'll drop it in future

Signed-off-by: Wang Nan <[email protected]>
---
kernel/kprobes.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index edac74b..278b2511 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2835,10 +2835,81 @@ convert_early_kprobes(void)
free_aggr_kprobe(&op->kp);
}
};
+
+static int early_kprobe_pre_handler(struct kprobe *p, struct pt_regs *regs)
+{
+ const char *sym = NULL;
+ char *modname, namebuf[KSYM_NAME_LEN];
+ unsigned long offset = 0;
+
+ sym = kallsyms_lookup((unsigned long)p->addr, NULL,
+ &offset, &modname, namebuf);
+ if (sym)
+ pr_info("Hit early kprobe at %s+0x%lx%s%s\n",
+ sym, offset,
+ (modname ? " " : ""),
+ (modname ? modname : ""));
+ else
+ pr_info("Hit early kprobe at %p\n", p->addr);
+ return 0;
+}
+
+DEFINE_EKPROBE_ALLOC_OPS(struct kprobe, early_kprobe_setup, static);
+static int __init early_kprobe_setup(char *p)
+{
+ unsigned long long addr;
+ struct kprobe *kp;
+ int len = strlen(p);
+ int err;
+
+ if (len <= 0) {
+ pr_err("early kprobe: wrong param: %s\n", p);
+ return 0;
+ }
+
+ if ((p[0] == '0') && (p[1] == 'x')) {
+ err = kstrtoull(p, 16, &addr);
+ if (err) {
+ pr_err("early kprobe: wrong address: %p\n", p);
+ return 0;
+ }
+ } else {
+ addr = kallsyms_lookup_name(p);
+ if (!addr) {
+ pr_err("early kprobe: wrong symbol: %s\n", p);
+ return 0;
+ }
+ }
+
+ if ((addr < (unsigned long)_text) ||
+ (addr >= (unsigned long)_etext))
+ pr_err("early kprobe: address of %p out of range\n", p);
+
+ kp = ek_alloc_early_kprobe_setup();
+ if (kp == NULL) {
+ pr_err("early kprobe: no enough early kprobe slot\n");
+ return 0;
+ }
+ kp->addr = (void *)(unsigned long)(addr);
+ kp->pre_handler = early_kprobe_pre_handler;
+ err = register_kprobe(kp);
+ if (err) {
+ pr_err("early kprobe: register early kprobe %s failed\n", p);
+ ek_free_early_kprobe_setup(kp);
+ }
+ return 0;
+}
#else
static inline int register_early_kprobe(struct kprobe *p) { return -ENOSYS; }
static inline void convert_early_kprobes(void)
{
kprobes_initialized = 1;
}
+
+static int __init early_kprobe_setup(char *p)
+{
+ return 0;
+}
#endif
+
+early_param("ekprobe", early_kprobe_setup);
--
1.8.4

Subject: Re: [RFC PATCH v3 00/26] Early kprobe: enable kprobes at very early booting stage.

Hi,

Sorry for replying late.

(2015/02/13 14:39), Wang Nan wrote:
> I fell very sorry for people who reviewed my v2 patch series yesterday
> at https://lkml.org/lkml/2015/2/12/234 because I didn't provide enough
> information in commit log. This v3 patch series add those missing
> commit messages. There are also 2 small fix based on v2:
>
> 1. Fixes ftrace_sort_mcount_area. Original patch doesn't work for module.
> 2. Wraps setting of kprobes_initialized in stop_machine() context.

>From the viewpoint of the maintenance, it seems over-engineered and
not general implementation. Please reconsider just initializing breakpoint
handler in earlier stage. Since those exceptions may happen anywhere,
those trap handlers setup very early stage. E.g. on x86, setup_arch()
setup early_trap_init() at beginning. So we just need to initialize
kprobes earlier.
I think this is almost enough for debugging, and very general because
we don't need optprobe for porting to other arch.

And for ftrace-based kprobe, we can just put breakpoint on mcount call at
beginning. ftrace will need to check and keep it when replacing mcount-call
with nop. Afterward, we can cleanly update those kprobes with ftrace-based
kprobe.

So, please start with smaller changes.

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC PATCH v3 15/26] early kprobes: x86: directly modify code.

(2015/02/13 14:40), Wang Nan wrote:
> When registering early kprobes, SMP should has not been enabled, so
> doesn't require synchronization in text_poke_bp(). Simply memcpy is
> enough.

BTW, we've already have text_poke_early for this purpose.

Thank you,

>
> Signed-off-by: Wang Nan <[email protected]>
> ---
> arch/x86/kernel/kprobes/opt.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> index 0dd8d08..21847ab 100644
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -36,6 +36,7 @@
> #include <asm/alternative.h>
> #include <asm/insn.h>
> #include <asm/debugreg.h>
> +#include <asm/tlbflush.h>
>
> #include "common.h"
>
> @@ -397,8 +398,15 @@ void arch_optimize_kprobes(struct list_head *oplist)
> insn_buf[0] = RELATIVEJUMP_OPCODE;
> *(s32 *)(&insn_buf[1]) = rel;
>
> - text_poke_bp(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE,
> - op->optinsn.insn);
> + if (unlikely(kprobes_is_early())) {
> + BUG_ON(!(op->kp.flags & KPROBE_FLAG_EARLY));
> + memcpy(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE);
> + local_flush_tlb();
> + sync_core();
> + } else {
> + text_poke_bp(op->kp.addr, insn_buf, RELATIVEJUMP_SIZE,
> + op->optinsn.insn);
> + }
>
> list_del_init(&op->list);
> }
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2015-02-25 11:18:11

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/26] Early kprobe: enable kprobes at very early booting stage.

On 2015/2/20 11:59, Masami Hiramatsu wrote:
> Hi,
>
> Sorry for replying late.
>
> (2015/02/13 14:39), Wang Nan wrote:
>> I fell very sorry for people who reviewed my v2 patch series yesterday
>> at https://lkml.org/lkml/2015/2/12/234 because I didn't provide enough
>> information in commit log. This v3 patch series add those missing
>> commit messages. There are also 2 small fix based on v2:
>>
>> 1. Fixes ftrace_sort_mcount_area. Original patch doesn't work for module.
>> 2. Wraps setting of kprobes_initialized in stop_machine() context.
>
> From the viewpoint of the maintenance, it seems over-engineered and
> not general implementation. Please reconsider just initializing breakpoint
> handler in earlier stage. Since those exceptions may happen anywhere,
> those trap handlers setup very early stage. E.g. on x86, setup_arch()
> setup early_trap_init() at beginning. So we just need to initialize
> kprobes earlier.

I tried as your suggestion. For x86, int3 handler doesn't work correctly until
trap_init(). I don't have enough time to look into this problem today (and I don't
familiar with x86 architecture). Could you please have a look on it?

Thank you.

> I think this is almost enough for debugging, and very general because
> we don't need optprobe for porting to other arch.
>
> And for ftrace-based kprobe, we can just put breakpoint on mcount call at
> beginning. ftrace will need to check and keep it when replacing mcount-call
> with nop. Afterward, we can cleanly update those kprobes with ftrace-based
> kprobe.
>
> So, please start with smaller changes.
>
> Thank you,
>

2015-02-25 11:49:07

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/26] Early kprobe: enable kprobes at very early booting stage.

On 2015/2/25 19:11, Wang Nan wrote:
> On 2015/2/20 11:59, Masami Hiramatsu wrote:
>> Hi,
>>
>> Sorry for replying late.
>>
>> (2015/02/13 14:39), Wang Nan wrote:
>>> I fell very sorry for people who reviewed my v2 patch series yesterday
>>> at https://lkml.org/lkml/2015/2/12/234 because I didn't provide enough
>>> information in commit log. This v3 patch series add those missing
>>> commit messages. There are also 2 small fix based on v2:
>>>
>>> 1. Fixes ftrace_sort_mcount_area. Original patch doesn't work for module.
>>> 2. Wraps setting of kprobes_initialized in stop_machine() context.
>>
>> From the viewpoint of the maintenance, it seems over-engineered and
>> not general implementation. Please reconsider just initializing breakpoint
>> handler in earlier stage. Since those exceptions may happen anywhere,
>> those trap handlers setup very early stage. E.g. on x86, setup_arch()
>> setup early_trap_init() at beginning. So we just need to initialize
>> kprobes earlier.
>
> I tried as your suggestion. For x86, int3 handler doesn't work correctly until
> trap_init(). I don't have enough time to look into this problem today (and I don't
> familiar with x86 architecture). Could you please have a look on it?
>
> Thank you.
>

Hi Masami,

Sorry for the noise. I have futher information may be useful.

I initialize kprobe and probed an instruction with int3 between setup_arch() and
trap_init(). It doesn't work at first. By dumping __log_buf[] I fount it reports NULL pointer panic.
With some random test I found following patch, which makes it works (looks like) correctly.

However I think there must be some reason to set dpl to '3' instead of '0' (set_mni_gate
use 0 as dpl). Do you have any suggestion on it?

Thank you.

------------- The patch ---------------
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9d2073e..ac29277 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -925,9 +925,9 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
/* Set of traps needed for early debugging. */
void __init early_trap_init(void)
{
- set_intr_gate_ist(X86_TRAP_DB, &debug, DEBUG_STACK);
+ set_intr_gate_ist(X86_TRAP_DB, &debug, 0);
/* int3 can be called from all */
- set_system_intr_gate_ist(X86_TRAP_BP, &int3, DEBUG_STACK);
+ set_intr_gate_ist(X86_TRAP_BP, &int3, 0);
#ifdef CONFIG_X86_32
set_intr_gate(X86_TRAP_PF, page_fault);
#endif