2019-06-27 11:24:39

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 0/7] powerpc/ftrace: Patch out -mprofile-kernel instructions

Changes since v1
(https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=114556):
- Patches 1,2,3 and 6: No changes
- Patch 4: Add smp_call_function() to flush icache on all cpus after
patching in the 'mflr r0' instruction.
- Patch 5: Changes as per Steven Rostedt's suggestions.
- Patch 7: Changes as per Masami and Joe Perches.

--
On powerpc64, -mprofile-kernel results in two instructions being
emitted: 'mflr r0' and 'bl _mcount'. So far, we were only nop'ing out
the branch to _mcount(). This series implements an approach to also nop
out the preceding mflr.


- Naveen


Naveen N. Rao (7):
ftrace: Expose flags used for ftrace_replace_code()
x86/ftrace: Fix use of flags in ftrace_replace_code()
ftrace: Expose __ftrace_replace_code()
powerpc/ftrace: Additionally nop out the preceding mflr with
-mprofile-kernel
ftrace: Update ftrace_location() for powerpc -mprofile-kernel
kprobes/ftrace: Use ftrace_location() when [dis]arming probes
powerpc/kprobes: Allow probing on any ftrace address

arch/powerpc/include/asm/ftrace.h | 8 +
arch/powerpc/kernel/kprobes-ftrace.c | 32 +++-
arch/powerpc/kernel/trace/ftrace.c | 258 ++++++++++++++++++++++++---
arch/x86/kernel/ftrace.c | 3 +-
include/linux/ftrace.h | 15 ++
kernel/kprobes.c | 10 +-
kernel/trace/ftrace.c | 15 +-
7 files changed, 302 insertions(+), 39 deletions(-)

--
2.22.0


2019-06-27 11:24:50

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 1/7] ftrace: Expose flags used for ftrace_replace_code()

Since ftrace_replace_code() is a __weak function and can be overridden,
we need to expose the flags that can be set. So, move the flags enum to
the header file.

Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
---
include/linux/ftrace.h | 5 +++++
kernel/trace/ftrace.c | 5 -----
2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 25e2995d4a4c..e97789c95c4e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -162,6 +162,11 @@ enum {
FTRACE_OPS_FL_TRACE_ARRAY = 1 << 15,
};

+enum {
+ FTRACE_MODIFY_ENABLE_FL = (1 << 0),
+ FTRACE_MODIFY_MAY_SLEEP_FL = (1 << 1),
+};
+
#ifdef CONFIG_DYNAMIC_FTRACE
/* The hash used to know what functions callbacks trace */
struct ftrace_ops_hash {
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 38277af44f5c..5710a6b3edc1 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -75,11 +75,6 @@
#define INIT_OPS_HASH(opsname)
#endif

-enum {
- FTRACE_MODIFY_ENABLE_FL = (1 << 0),
- FTRACE_MODIFY_MAY_SLEEP_FL = (1 << 1),
-};
-
struct ftrace_ops ftrace_list_end __read_mostly = {
.func = ftrace_stub,
.flags = FTRACE_OPS_FL_RECURSION_SAFE | FTRACE_OPS_FL_STUB,
--
2.22.0

2019-06-27 11:24:52

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 2/7] x86/ftrace: Fix use of flags in ftrace_replace_code()

In commit a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be
schedulable), the generic ftrace_replace_code() function was modified to
accept a flags argument in place of a single 'enable' flag. However, the
x86 version of this function was not updated. Fix the same.

Fixes: a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be schedulable")
Signed-off-by: Naveen N. Rao <[email protected]>
---
arch/x86/kernel/ftrace.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 0927bb158ffc..f34005a17051 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -573,8 +573,9 @@ static void run_sync(void)
local_irq_disable();
}

-void ftrace_replace_code(int enable)
+void ftrace_replace_code(int mod_flags)
{
+ int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
struct ftrace_rec_iter *iter;
struct dyn_ftrace *rec;
const char *report = "adding breakpoints";
--
2.22.0

2019-06-27 11:24:57

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 3/7] ftrace: Expose __ftrace_replace_code()

While over-riding ftrace_replace_code(), we still want to reuse the
existing __ftrace_replace_code() function. Rename the function and
make it available for other kernel code.

Signed-off-by: Naveen N. Rao <[email protected]>
---
include/linux/ftrace.h | 1 +
kernel/trace/ftrace.c | 8 ++++----
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e97789c95c4e..fa653a561da5 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -456,6 +456,7 @@ ftrace_set_early_filter(struct ftrace_ops *ops, char *buf, int enable);
/* defined in arch */
extern int ftrace_ip_converted(unsigned long ip);
extern int ftrace_dyn_arch_init(void);
+extern int ftrace_replace_code_rec(struct dyn_ftrace *rec, int enable);
extern void ftrace_replace_code(int enable);
extern int ftrace_update_ftrace_func(ftrace_func_t func);
extern void ftrace_caller(void);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5710a6b3edc1..21d8e201ee80 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2351,8 +2351,8 @@ unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec)
return (unsigned long)FTRACE_ADDR;
}

-static int
-__ftrace_replace_code(struct dyn_ftrace *rec, int enable)
+int
+ftrace_replace_code_rec(struct dyn_ftrace *rec, int enable)
{
unsigned long ftrace_old_addr;
unsigned long ftrace_addr;
@@ -2403,7 +2403,7 @@ void __weak ftrace_replace_code(int mod_flags)
if (rec->flags & FTRACE_FL_DISABLED)
continue;

- failed = __ftrace_replace_code(rec, enable);
+ failed = ftrace_replace_code_rec(rec, enable);
if (failed) {
ftrace_bug(failed, rec);
/* Stop processing */
@@ -5827,7 +5827,7 @@ void ftrace_module_enable(struct module *mod)
rec->flags = cnt;

if (ftrace_start_up && cnt) {
- int failed = __ftrace_replace_code(rec, 1);
+ int failed = ftrace_replace_code_rec(rec, 1);
if (failed) {
ftrace_bug(failed, rec);
goto out_loop;
--
2.22.0

2019-06-27 11:24:58

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
enable function tracing and profiling. So far, with dynamic ftrace, we
used to only patch out the branch to _mcount(). However, mflr is
executed by the branch unit that can only execute one per cycle on
POWER9 and shared with branches, so it would be nice to avoid it where
possible.

We cannot simply nop out the mflr either. When enabling function
tracing, there can be a race if tracing is enabled when some thread was
interrupted after executing a nop'ed out mflr. In this case, the thread
would execute the now-patched-in branch to _mcount() without having
executed the preceding mflr.

To solve this, we now enable function tracing in 2 steps: patch in the
mflr instruction, use 'smp_call_function(isync);
synchronize_rcu_tasks()' to ensure all existing threads make progress,
and then patch in the branch to _mcount(). We override
ftrace_replace_code() with a powerpc64 variant for this purpose.

Suggested-by: Nicholas Piggin <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
---
arch/powerpc/kernel/trace/ftrace.c | 258 ++++++++++++++++++++++++++---
1 file changed, 236 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 517662a56bdc..86c2b50dcaa9 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
{
unsigned long entry, ptr, tramp;
unsigned long ip = rec->ip;
- unsigned int op, pop;
+ unsigned int op;

/* read where this goes */
if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
@@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,

#ifdef CONFIG_MPROFILE_KERNEL
/* When using -mkernel_profile there is no load to jump over */
- pop = PPC_INST_NOP;
-
if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
pr_err("Fetching instruction at %lx failed.\n", ip - 4);
return -EFAULT;
@@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,

/* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
- pr_err("Unexpected instruction %08x around bl _mcount\n", op);
+ pr_err("Unexpected instruction %08x before bl _mcount\n", op);
return -EINVAL;
}
-#else
- /*
- * Our original call site looks like:
- *
- * bl <tramp>
- * ld r2,XX(r1)
- *
- * Milton Miller pointed out that we can not simply nop the branch.
- * If a task was preempted when calling a trace function, the nops
- * will remove the way to restore the TOC in r2 and the r2 TOC will
- * get corrupted.
- *
- * Use a b +8 to jump over the load.
- */

- pop = PPC_INST_BRANCH | 8; /* b +8 */
+ /* We should patch out the bl to _mcount first */
+ if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
+ pr_err("Patching NOP failed.\n");
+ return -EPERM;
+ }

+ /* then, nop out the preceding 'mflr r0' as an optimization */
+ if (op == PPC_INST_MFLR &&
+ patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
+ pr_err("Patching NOP failed.\n");
+ return -EPERM;
+ }
+#else
/*
* Check what is in the next instruction. We can see ld r2,40(r1), but
* on first pass after boot we will see mflr r0.
@@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
return -EINVAL;
}
-#endif /* CONFIG_MPROFILE_KERNEL */

- if (patch_instruction((unsigned int *)ip, pop)) {
+ /*
+ * Our original call site looks like:
+ *
+ * bl <tramp>
+ * ld r2,XX(r1)
+ *
+ * Milton Miller pointed out that we can not simply nop the branch.
+ * If a task was preempted when calling a trace function, the nops
+ * will remove the way to restore the TOC in r2 and the r2 TOC will
+ * get corrupted.
+ *
+ * Use a b +8 to jump over the load.
+ */
+ if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
pr_err("Patching NOP failed.\n");
return -EPERM;
}
+#endif /* CONFIG_MPROFILE_KERNEL */

return 0;
}
@@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
return -EPERM;
}

+#ifdef CONFIG_MPROFILE_KERNEL
+ /* Nop out the preceding 'mflr r0' as an optimization */
+ if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
+ pr_err("Fetching instruction at %lx failed.\n", ip - 4);
+ return -EFAULT;
+ }
+
+ /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
+ if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
+ pr_err("Unexpected instruction %08x before bl _mcount\n", op);
+ return -EINVAL;
+ }
+
+ if (op == PPC_INST_MFLR &&
+ patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
+ pr_err("Patching NOP failed.\n");
+ return -EPERM;
+ }
+#endif
+
return 0;
}

@@ -429,6 +457,7 @@ int ftrace_make_nop(struct module *mod,
{
unsigned long ip = rec->ip;
unsigned int old, new;
+ int rc;

/*
* If the calling address is more that 24 bits away,
@@ -439,7 +468,34 @@ int ftrace_make_nop(struct module *mod,
/* within range */
old = ftrace_call_replace(ip, addr, 1);
new = PPC_INST_NOP;
- return ftrace_modify_code(ip, old, new);
+ rc = ftrace_modify_code(ip, old, new);
+#ifdef CONFIG_MPROFILE_KERNEL
+ if (rc)
+ return rc;
+
+ /*
+ * For -mprofile-kernel, we patch out the preceding 'mflr r0'
+ * instruction, as an optimization. It is important to nop out
+ * the branch to _mcount() first, as a lone 'mflr r0' is
+ * harmless.
+ */
+ if (probe_kernel_read(&old, (void *)(ip - 4), 4)) {
+ pr_err("Fetching instruction at %lx failed.\n", ip - 4);
+ return -EFAULT;
+ }
+
+ /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
+ if (old != PPC_INST_MFLR && old != PPC_INST_STD_LR) {
+ pr_err("Unexpected instruction %08x before bl _mcount\n",
+ old);
+ return -EINVAL;
+ }
+
+ if (old == PPC_INST_MFLR)
+ rc = patch_instruction((unsigned int *)(ip - 4),
+ PPC_INST_NOP);
+#endif
+ return rc;
} else if (core_kernel_text(ip))
return __ftrace_make_nop_kernel(rec, addr);

@@ -567,6 +623,37 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
return -EINVAL;
}

+#ifdef CONFIG_MPROFILE_KERNEL
+ /*
+ * We could end up here without having called __ftrace_make_call_prep()
+ * if function tracing is enabled before a module is loaded.
+ *
+ * ftrace_module_enable() --> ftrace_replace_code_rec() -->
+ * ftrace_make_call() --> __ftrace_make_call()
+ *
+ * In this scenario, the previous instruction will be a NOP. It is
+ * safe to patch it with a 'mflr r0' since we know for a fact that
+ * this code is not yet being run.
+ */
+ ip -= MCOUNT_INSN_SIZE;
+
+ /* read where this goes */
+ if (probe_kernel_read(op, ip, MCOUNT_INSN_SIZE))
+ return -EFAULT;
+
+ /*
+ * nothing to do if this is using the older -mprofile-kernel
+ * instruction sequence
+ */
+ if (op[0] != PPC_INST_NOP)
+ return 0;
+
+ if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) {
+ pr_err("Patching MFLR failed.\n");
+ return -EPERM;
+ }
+#endif
+
return 0;
}

@@ -863,6 +950,133 @@ void arch_ftrace_update_code(int command)
ftrace_modify_all_code(command);
}

+#ifdef CONFIG_MPROFILE_KERNEL
+/* Returns 1 if we patched in the mflr */
+static int __ftrace_make_call_prep(struct dyn_ftrace *rec)
+{
+ void *ip = (void *)rec->ip - MCOUNT_INSN_SIZE;
+ unsigned int op[2];
+
+ /* read where this goes */
+ if (probe_kernel_read(op, ip, sizeof(op)))
+ return -EFAULT;
+
+ if (op[1] != PPC_INST_NOP) {
+ pr_err("Unexpected call sequence at %p: %x %x\n",
+ ip, op[0], op[1]);
+ return -EINVAL;
+ }
+
+ /*
+ * nothing to do if this is using the older -mprofile-kernel
+ * instruction sequence
+ */
+ if (op[0] != PPC_INST_NOP)
+ return 0;
+
+ if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) {
+ pr_err("Patching MFLR failed.\n");
+ return -EPERM;
+ }
+
+ return 1;
+}
+
+static void do_isync(void *info __maybe_unused)
+{
+ isync();
+}
+
+/*
+ * When enabling function tracing for -mprofile-kernel that uses a
+ * 2-instruction sequence of 'mflr r0, bl _mcount()', we use a 2 step process:
+ * 1. Patch in the 'mflr r0' instruction
+ * 1a. flush icache on all cpus, so that the updated instruction is seen
+ * 1b. synchronize_rcu_tasks() to ensure that any cpus that had executed
+ * the earlier nop there make progress (and hence do not branch into
+ * ftrace without executing the preceding mflr)
+ * 2. Patch in the branch to ftrace
+ */
+void ftrace_replace_code(int mod_flags)
+{
+ int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
+ int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL;
+ int ret, failed, make_call = 0;
+ struct ftrace_rec_iter *iter;
+ struct dyn_ftrace *rec;
+
+ if (unlikely(!ftrace_enabled))
+ return;
+
+ for_ftrace_rec_iter(iter) {
+ rec = ftrace_rec_iter_record(iter);
+
+ if (rec->flags & FTRACE_FL_DISABLED)
+ continue;
+
+ failed = 0;
+ ret = ftrace_test_record(rec, enable);
+ if (ret == FTRACE_UPDATE_MAKE_CALL) {
+ failed = __ftrace_make_call_prep(rec);
+ if (failed < 0) {
+ ftrace_bug(failed, rec);
+ return;
+ } else if (failed == 1)
+ make_call++;
+ }
+
+ if (!failed) {
+ /* We can patch the record directly */
+ failed = ftrace_replace_code_rec(rec, enable);
+ if (failed) {
+ ftrace_bug(failed, rec);
+ return;
+ }
+ }
+
+ if (schedulable)
+ cond_resched();
+ }
+
+ if (!make_call)
+ /* No records needed patching a preceding mflr */
+ return;
+
+ /* Make sure all cpus see the new instruction */
+ smp_call_function(do_isync, NULL, 1);
+
+ /*
+ * We also need to ensure that all cpus make progress:
+ * - With !CONFIG_PREEMPT, we want to be sure that cpus return from
+ * any interrupts they may be handling, and make progress.
+ * - With CONFIG_PREEMPT, we want to be additionally sure that there
+ * are no pre-empted tasks that have executed the earlier nop, and
+ * might end up executing the subsequently patched branch to ftrace.
+ */
+ synchronize_rcu_tasks();
+
+ for_ftrace_rec_iter(iter) {
+ rec = ftrace_rec_iter_record(iter);
+
+ if (rec->flags & FTRACE_FL_DISABLED)
+ continue;
+
+ ret = ftrace_test_record(rec, enable);
+ if (ret == FTRACE_UPDATE_MAKE_CALL)
+ failed = ftrace_replace_code_rec(rec, enable);
+
+ if (failed) {
+ ftrace_bug(failed, rec);
+ return;
+ }
+
+ if (schedulable)
+ cond_resched();
+ }
+
+}
+#endif
+
#ifdef CONFIG_PPC64
#define PACATOC offsetof(struct paca_struct, kernel_toc)

--
2.22.0

2019-06-27 11:25:05

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 6/7] kprobes/ftrace: Use ftrace_location() when [dis]arming probes

Ftrace location could include more than a single instruction in case of
some architectures (powerpc64, for now). In this case, kprobe is
permitted on any of those instructions, and uses ftrace infrastructure
for functioning.

However, [dis]arm_kprobe_ftrace() uses the kprobe address when setting
up ftrace filter IP. This won't work if the address points to any
instruction apart from the one that has a branch to _mcount(). To
resolve this, have [dis]arm_kprobe_ftrace() use ftrace_function() to
identify the filter IP.

Acked-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
---
kernel/kprobes.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 445337c107e0..282ee704e2d8 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -978,10 +978,10 @@ static int prepare_kprobe(struct kprobe *p)
/* Caller must lock kprobe_mutex */
static int arm_kprobe_ftrace(struct kprobe *p)
{
+ unsigned long ftrace_ip = ftrace_location((unsigned long)p->addr);
int ret = 0;

- ret = ftrace_set_filter_ip(&kprobe_ftrace_ops,
- (unsigned long)p->addr, 0, 0);
+ ret = ftrace_set_filter_ip(&kprobe_ftrace_ops, ftrace_ip, 0, 0);
if (ret) {
pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n",
p->addr, ret);
@@ -1005,13 +1005,14 @@ static int arm_kprobe_ftrace(struct kprobe *p)
* non-empty filter_hash for IPMODIFY ops, we're safe from an accidental
* empty filter_hash which would undesirably trace all functions.
*/
- ftrace_set_filter_ip(&kprobe_ftrace_ops, (unsigned long)p->addr, 1, 0);
+ ftrace_set_filter_ip(&kprobe_ftrace_ops, ftrace_ip, 1, 0);
return ret;
}

/* Caller must lock kprobe_mutex */
static int disarm_kprobe_ftrace(struct kprobe *p)
{
+ unsigned long ftrace_ip = ftrace_location((unsigned long)p->addr);
int ret = 0;

if (kprobe_ftrace_enabled == 1) {
@@ -1022,8 +1023,7 @@ static int disarm_kprobe_ftrace(struct kprobe *p)

kprobe_ftrace_enabled--;

- ret = ftrace_set_filter_ip(&kprobe_ftrace_ops,
- (unsigned long)p->addr, 1, 0);
+ ret = ftrace_set_filter_ip(&kprobe_ftrace_ops, ftrace_ip, 1, 0);
WARN_ONCE(ret < 0, "Failed to disarm kprobe-ftrace at %pS (%d)\n",
p->addr, ret);
return ret;
--
2.22.0

2019-06-27 11:25:19

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 7/7] powerpc/kprobes: Allow probing on any ftrace address

With KPROBES_ON_FTRACE, kprobe is allowed to be inserted on instructions
that branch to _mcount (referred to as ftrace location). With
-mprofile-kernel, we now include the preceding 'mflr r0' as being part
of the ftrace location.

However, by default, probing on an instruction that is not actually the
branch to _mcount() is prohibited, as that is considered to not be at an
instruction boundary. This is not the case on powerpc, so allow the same
by overriding arch_check_ftrace_location()

In addition, we update kprobe_ftrace_handler() to detect this scenarios
and to pass the proper nip to the pre and post probe handlers.

Signed-off-by: Naveen N. Rao <[email protected]>
---
arch/powerpc/kernel/kprobes-ftrace.c | 32 +++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 972cb28174b2..23c840748183 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -12,15 +12,35 @@
#include <linux/preempt.h>
#include <linux/ftrace.h>

+/*
+ * With -mprofile-kernel, we patch two instructions -- the branch to _mcount
+ * as well as the preceding 'mflr r0'. Both these instructions are claimed
+ * by ftrace and we should allow probing on either instruction.
+ */
+int arch_check_ftrace_location(struct kprobe *p)
+{
+ if (ftrace_location((unsigned long)p->addr))
+ p->flags |= KPROBE_FLAG_FTRACE;
+ return 0;
+}
+
/* Ftrace callback handler for kprobes */
void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
struct ftrace_ops *ops, struct pt_regs *regs)
{
struct kprobe *p;
+ int mflr_kprobe = 0;
struct kprobe_ctlblk *kcb;

p = get_kprobe((kprobe_opcode_t *)nip);
- if (unlikely(!p) || kprobe_disabled(p))
+ if (!p) {
+ p = get_kprobe((kprobe_opcode_t *)(nip - MCOUNT_INSN_SIZE));
+ if (unlikely(!p))
+ return;
+ mflr_kprobe = 1;
+ }
+
+ if (kprobe_disabled(p))
return;

kcb = get_kprobe_ctlblk();
@@ -33,6 +53,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
*/
regs->nip -= MCOUNT_INSN_SIZE;

+ if (mflr_kprobe)
+ regs->nip -= MCOUNT_INSN_SIZE;
+
__this_cpu_write(current_kprobe, p);
kcb->kprobe_status = KPROBE_HIT_ACTIVE;
if (!p->pre_handler || !p->pre_handler(p, regs)) {
@@ -45,6 +68,8 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
kcb->kprobe_status = KPROBE_HIT_SSDONE;
p->post_handler(p, regs, 0);
}
+ if (mflr_kprobe)
+ regs->nip += MCOUNT_INSN_SIZE;
}
/*
* If pre_handler returns !0, it changes regs->nip. We have to
@@ -57,6 +82,11 @@ NOKPROBE_SYMBOL(kprobe_ftrace_handler);

int arch_prepare_kprobe_ftrace(struct kprobe *p)
{
+ if ((unsigned long)p->addr & 0x03) {
+ pr_err("Attempt to register kprobe at an unaligned address\n");
+ return -EILSEQ;
+ }
+
p->ainsn.insn = NULL;
p->ainsn.boostable = -1;
return 0;
--
2.22.0

2019-06-27 11:25:23

by Naveen N. Rao

[permalink] [raw]
Subject: [PATCH v2 5/7] ftrace: Update ftrace_location() for powerpc -mprofile-kernel

Now that we are patching the preceding 'mflr r0' instruction with
-mprofile-kernel, we need to update ftrace_location() to recognise that
as being part of ftrace.

To do this, we introduce FTRACE_IP_EXTENSION to denote the length (in
bytes) of the mcount caller. By default, this is set to 0. For powerpc
with CONFIG_MPROFILE_KERNEL, we set this to MCOUNT_INSN_SIZE so that
this works if ftrace_location() is called with the address of the actual
ftrace entry call, or with the address of the preceding 'mflr r0'
instruction.

Note that we do not check if the preceding instruction is indeed the
'mflr r0'. Earlier -mprofile-kernel ABI included a 'std r0,stack'
instruction between the 'mflr r0' and the 'bl _mcount'. This is harmless
as the 'std r0,stack' instruction is inconsequential and is not relied
upon.

Suggested-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Naveen N. Rao <[email protected]>
---
arch/powerpc/include/asm/ftrace.h | 8 ++++++++
include/linux/ftrace.h | 9 +++++++++
kernel/trace/ftrace.c | 2 +-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 3dfb80b86561..510a8ac8ac8d 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -61,6 +61,14 @@ struct dyn_arch_ftrace {

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
#define ARCH_SUPPORTS_FTRACE_OPS 1
+
+#ifdef CONFIG_MPROFILE_KERNEL
+/*
+ * We consider two instructions -- 'mflr r0', 'bl _mcount' -- to be part
+ * of ftrace with -mprofile-kernel
+ */
+#define FTRACE_IP_EXTENSION MCOUNT_INSN_SIZE
+#endif
#endif
#endif /* CONFIG_FUNCTION_TRACER */

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index fa653a561da5..310b514438cb 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -29,6 +29,15 @@
#define ARCH_SUPPORTS_FTRACE_OPS 0
#endif

+/*
+ * This denotes the number of instructions (in bytes) that is used by the
+ * arch mcount caller. All instructions in this range will be owned by
+ * ftrace.
+ */
+#ifndef FTRACE_IP_EXTENSION
+#define FTRACE_IP_EXTENSION 0
+#endif
+
/*
* If the arch's mcount caller does not support all of ftrace's
* features, then it must call an indirect function that
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 21d8e201ee80..308555925b81 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1575,7 +1575,7 @@ unsigned long ftrace_location_range(unsigned long start, unsigned long end)
*/
unsigned long ftrace_location(unsigned long ip)
{
- return ftrace_location_range(ip, ip);
+ return ftrace_location_range(ip, ip + FTRACE_IP_EXTENSION);
}

/**
--
2.22.0

2019-06-27 11:28:09

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] x86/ftrace: Fix use of flags in ftrace_replace_code()

Naveen N. Rao wrote:
> In commit a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be
> schedulable), the generic ftrace_replace_code() function was modified to
> accept a flags argument in place of a single 'enable' flag. However, the
> x86 version of this function was not updated. Fix the same.
>
> Fixes: a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be schedulable")
> Signed-off-by: Naveen N. Rao <[email protected]>
> ---
> arch/x86/kernel/ftrace.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)

If the rest of this series is ok, and if Ingo and Steven are ok to have
this series go through the powerpc tree, then I can re-post this
particular patch for x86 after -rc1.

Thanks,
Naveen


2019-06-27 13:29:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] x86/ftrace: Fix use of flags in ftrace_replace_code()

On Thu, 27 Jun 2019 16:53:50 +0530
"Naveen N. Rao" <[email protected]> wrote:

> In commit a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be
> schedulable), the generic ftrace_replace_code() function was modified to
> accept a flags argument in place of a single 'enable' flag. However, the
> x86 version of this function was not updated. Fix the same.
>
> Fixes: a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be schedulable")

I don't mind this change, but it's not a bug, and I'm not sure it
should have the fixes tag. The reason being, the
FTRACE_MODIFY_ENABLE_FL is only set when ftrace is called by with the
command flag FTRACE_MAY_SLEEP, which is never done on x86.

That said, I'm fine with the change as it makes it more robust, but by
adding the fixes tag, you're going to get this into all the stable
code, and I'm not sure that's really necessary.

-- Steve


> Signed-off-by: Naveen N. Rao <[email protected]>
> ---
> arch/x86/kernel/ftrace.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 0927bb158ffc..f34005a17051 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -573,8 +573,9 @@ static void run_sync(void)
> local_irq_disable();
> }
>
> -void ftrace_replace_code(int enable)
> +void ftrace_replace_code(int mod_flags)
> {
> + int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
> struct ftrace_rec_iter *iter;
> struct dyn_ftrace *rec;
> const char *report = "adding breakpoints";

2019-06-27 14:20:54

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] powerpc/kprobes: Allow probing on any ftrace address

On Thu, 27 Jun 2019 16:53:55 +0530
"Naveen N. Rao" <[email protected]> wrote:

> With KPROBES_ON_FTRACE, kprobe is allowed to be inserted on instructions
> that branch to _mcount (referred to as ftrace location). With
> -mprofile-kernel, we now include the preceding 'mflr r0' as being part
> of the ftrace location.
>
> However, by default, probing on an instruction that is not actually the
> branch to _mcount() is prohibited, as that is considered to not be at an
> instruction boundary. This is not the case on powerpc, so allow the same
> by overriding arch_check_ftrace_location()
>
> In addition, we update kprobe_ftrace_handler() to detect this scenarios
> and to pass the proper nip to the pre and post probe handlers.
>
> Signed-off-by: Naveen N. Rao <[email protected]>

Looks good to me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thank you!

> ---
> arch/powerpc/kernel/kprobes-ftrace.c | 32 +++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
> index 972cb28174b2..23c840748183 100644
> --- a/arch/powerpc/kernel/kprobes-ftrace.c
> +++ b/arch/powerpc/kernel/kprobes-ftrace.c
> @@ -12,15 +12,35 @@
> #include <linux/preempt.h>
> #include <linux/ftrace.h>
>
> +/*
> + * With -mprofile-kernel, we patch two instructions -- the branch to _mcount
> + * as well as the preceding 'mflr r0'. Both these instructions are claimed
> + * by ftrace and we should allow probing on either instruction.
> + */
> +int arch_check_ftrace_location(struct kprobe *p)
> +{
> + if (ftrace_location((unsigned long)p->addr))
> + p->flags |= KPROBE_FLAG_FTRACE;
> + return 0;
> +}
> +
> /* Ftrace callback handler for kprobes */
> void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
> struct ftrace_ops *ops, struct pt_regs *regs)
> {
> struct kprobe *p;
> + int mflr_kprobe = 0;
> struct kprobe_ctlblk *kcb;
>
> p = get_kprobe((kprobe_opcode_t *)nip);
> - if (unlikely(!p) || kprobe_disabled(p))
> + if (!p) {
> + p = get_kprobe((kprobe_opcode_t *)(nip - MCOUNT_INSN_SIZE));
> + if (unlikely(!p))
> + return;
> + mflr_kprobe = 1;
> + }
> +
> + if (kprobe_disabled(p))
> return;
>
> kcb = get_kprobe_ctlblk();
> @@ -33,6 +53,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
> */
> regs->nip -= MCOUNT_INSN_SIZE;
>
> + if (mflr_kprobe)
> + regs->nip -= MCOUNT_INSN_SIZE;
> +
> __this_cpu_write(current_kprobe, p);
> kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> if (!p->pre_handler || !p->pre_handler(p, regs)) {
> @@ -45,6 +68,8 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
> kcb->kprobe_status = KPROBE_HIT_SSDONE;
> p->post_handler(p, regs, 0);
> }
> + if (mflr_kprobe)
> + regs->nip += MCOUNT_INSN_SIZE;
> }
> /*
> * If pre_handler returns !0, it changes regs->nip. We have to
> @@ -57,6 +82,11 @@ NOKPROBE_SYMBOL(kprobe_ftrace_handler);
>
> int arch_prepare_kprobe_ftrace(struct kprobe *p)
> {
> + if ((unsigned long)p->addr & 0x03) {
> + pr_err("Attempt to register kprobe at an unaligned address\n");
> + return -EILSEQ;
> + }
> +
> p->ainsn.insn = NULL;
> p->ainsn.boostable = -1;
> return 0;
> --
> 2.22.0
>


--
Masami Hiramatsu <[email protected]>

2019-06-27 14:37:11

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 3/7] ftrace: Expose __ftrace_replace_code()

On Thu, 27 Jun 2019 16:53:51 +0530
"Naveen N. Rao" <[email protected]> wrote:

> While over-riding ftrace_replace_code(), we still want to reuse the
> existing __ftrace_replace_code() function. Rename the function and
> make it available for other kernel code.
>
> Signed-off-by: Naveen N. Rao <[email protected]>
> ---
> include/linux/ftrace.h | 1 +
> kernel/trace/ftrace.c | 8 ++++----
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index e97789c95c4e..fa653a561da5 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -456,6 +456,7 @@ ftrace_set_early_filter(struct ftrace_ops *ops, char *buf, int enable);
> /* defined in arch */
> extern int ftrace_ip_converted(unsigned long ip);
> extern int ftrace_dyn_arch_init(void);
> +extern int ftrace_replace_code_rec(struct dyn_ftrace *rec, int enable);
> extern void ftrace_replace_code(int enable);
> extern int ftrace_update_ftrace_func(ftrace_func_t func);
> extern void ftrace_caller(void);
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 5710a6b3edc1..21d8e201ee80 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2351,8 +2351,8 @@ unsigned long ftrace_get_addr_curr(struct dyn_ftrace *rec)
> return (unsigned long)FTRACE_ADDR;
> }
>
> -static int
> -__ftrace_replace_code(struct dyn_ftrace *rec, int enable)
> +int
> +ftrace_replace_code_rec(struct dyn_ftrace *rec, int enable)

Make this a single line, as it removes static and "__" which should
keep it normal.

Other than that,

Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve


> {
> unsigned long ftrace_old_addr;
> unsigned long ftrace_addr;
> @@ -2403,7 +2403,7 @@ void __weak ftrace_replace_code(int mod_flags)
> if (rec->flags & FTRACE_FL_DISABLED)
> continue;
>
> - failed = __ftrace_replace_code(rec, enable);
> + failed = ftrace_replace_code_rec(rec, enable);
> if (failed) {
> ftrace_bug(failed, rec);
> /* Stop processing */
> @@ -5827,7 +5827,7 @@ void ftrace_module_enable(struct module *mod)
> rec->flags = cnt;
>
> if (ftrace_start_up && cnt) {
> - int failed = __ftrace_replace_code(rec, 1);
> + int failed = ftrace_replace_code_rec(rec, 1);
> if (failed) {
> ftrace_bug(failed, rec);
> goto out_loop;

2019-06-27 14:49:38

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 2/7] x86/ftrace: Fix use of flags in ftrace_replace_code()

Steven Rostedt wrote:
> On Thu, 27 Jun 2019 16:53:50 +0530
> "Naveen N. Rao" <[email protected]> wrote:
>
>> In commit a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be
>> schedulable), the generic ftrace_replace_code() function was modified to
>> accept a flags argument in place of a single 'enable' flag. However, the
>> x86 version of this function was not updated. Fix the same.
>>
>> Fixes: a0572f687fb3c ("ftrace: Allow ftrace_replace_code() to be schedulable")
>
> I don't mind this change, but it's not a bug, and I'm not sure it
> should have the fixes tag. The reason being, the
> FTRACE_MODIFY_ENABLE_FL is only set when ftrace is called by with the
> command flag FTRACE_MAY_SLEEP, which is never done on x86.

I guess you meant to say that *FTRACE_MODIFY_MAY_SLEEP_FL* is only set
with FTRACE_MAY_SLEEP.

>
> That said, I'm fine with the change as it makes it more robust, but by
> adding the fixes tag, you're going to get this into all the stable
> code, and I'm not sure that's really necessary.

Agreed. Thanks for pointing this out. We can drop this patch from this
series and I will re-post this as a simpler cleanup later on.


Thanks,
Naveen


2019-06-27 14:51:29

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

Naveen N. Rao wrote:
> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
> enable function tracing and profiling. So far, with dynamic ftrace, we
> used to only patch out the branch to _mcount(). However, mflr is
> executed by the branch unit that can only execute one per cycle on
> POWER9 and shared with branches, so it would be nice to avoid it where
> possible.
>
> We cannot simply nop out the mflr either. When enabling function
> tracing, there can be a race if tracing is enabled when some thread was
> interrupted after executing a nop'ed out mflr. In this case, the thread
> would execute the now-patched-in branch to _mcount() without having
> executed the preceding mflr.
>
> To solve this, we now enable function tracing in 2 steps: patch in the
> mflr instruction, use 'smp_call_function(isync);
> synchronize_rcu_tasks()' to ensure all existing threads make progress,
> and then patch in the branch to _mcount(). We override
> ftrace_replace_code() with a powerpc64 variant for this purpose.
>
> Suggested-by: Nicholas Piggin <[email protected]>
> Reviewed-by: Nicholas Piggin <[email protected]>
> Signed-off-by: Naveen N. Rao <[email protected]>
> ---
> arch/powerpc/kernel/trace/ftrace.c | 258 ++++++++++++++++++++++++++---
> 1 file changed, 236 insertions(+), 22 deletions(-)
>

I missed adding a comment here to explain the changes. As discussed in
the previous series, I think we are ok with this patch from a CMODX
perspective. For smp_call_function(), I decided to have it included in
this patch since we know that we need it here for sure. I am not
entirely sure we want to do that in patch_instruction() since ftrace
doesn't seem to need it elsewhere. As Nick Piggin pointed out, we may
want to have users of patch_instruction() (kprobes) add the necessary
synchronization.


Thanks,
Naveen


2019-06-27 15:10:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

On Thu, 27 Jun 2019 16:53:52 +0530
"Naveen N. Rao" <[email protected]> wrote:

> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
> enable function tracing and profiling. So far, with dynamic ftrace, we
> used to only patch out the branch to _mcount(). However, mflr is
> executed by the branch unit that can only execute one per cycle on
> POWER9 and shared with branches, so it would be nice to avoid it where
> possible.
>
> We cannot simply nop out the mflr either. When enabling function
> tracing, there can be a race if tracing is enabled when some thread was
> interrupted after executing a nop'ed out mflr. In this case, the thread
> would execute the now-patched-in branch to _mcount() without having
> executed the preceding mflr.
>
> To solve this, we now enable function tracing in 2 steps: patch in the
> mflr instruction, use 'smp_call_function(isync);
> synchronize_rcu_tasks()' to ensure all existing threads make progress,
> and then patch in the branch to _mcount(). We override
> ftrace_replace_code() with a powerpc64 variant for this purpose.

You may want to explain that you do the reverse when patching it out.
That is, patch out the "bl _mcount" into a nop and then patch out the
"mflr r0". But interesting, I don't see a synchronize_rcu_tasks() call
there.


>
> Suggested-by: Nicholas Piggin <[email protected]>
> Reviewed-by: Nicholas Piggin <[email protected]>
> Signed-off-by: Naveen N. Rao <[email protected]>
> ---
> arch/powerpc/kernel/trace/ftrace.c | 258 ++++++++++++++++++++++++++---
> 1 file changed, 236 insertions(+), 22 deletions(-)
>
> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
> index 517662a56bdc..86c2b50dcaa9 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
> {
> unsigned long entry, ptr, tramp;
> unsigned long ip = rec->ip;
> - unsigned int op, pop;
> + unsigned int op;
>
> /* read where this goes */
> if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
> @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,
>
> #ifdef CONFIG_MPROFILE_KERNEL
> /* When using -mkernel_profile there is no load to jump over */
> - pop = PPC_INST_NOP;
> -
> if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
> pr_err("Fetching instruction at %lx failed.\n", ip - 4);
> return -EFAULT;
> @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,
>
> /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
> if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
> - pr_err("Unexpected instruction %08x around bl _mcount\n", op);
> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
> return -EINVAL;
> }
> -#else
> - /*
> - * Our original call site looks like:
> - *
> - * bl <tramp>
> - * ld r2,XX(r1)
> - *
> - * Milton Miller pointed out that we can not simply nop the branch.
> - * If a task was preempted when calling a trace function, the nops
> - * will remove the way to restore the TOC in r2 and the r2 TOC will
> - * get corrupted.
> - *
> - * Use a b +8 to jump over the load.
> - */
>
> - pop = PPC_INST_BRANCH | 8; /* b +8 */
> + /* We should patch out the bl to _mcount first */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
>
> + /* then, nop out the preceding 'mflr r0' as an optimization */
> + if (op == PPC_INST_MFLR &&
> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
> +#else
> /*
> * Check what is in the next instruction. We can see ld r2,40(r1), but
> * on first pass after boot we will see mflr r0.
> @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
> pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
> return -EINVAL;
> }
> -#endif /* CONFIG_MPROFILE_KERNEL */
>
> - if (patch_instruction((unsigned int *)ip, pop)) {
> + /*
> + * Our original call site looks like:
> + *
> + * bl <tramp>
> + * ld r2,XX(r1)
> + *
> + * Milton Miller pointed out that we can not simply nop the branch.
> + * If a task was preempted when calling a trace function, the nops
> + * will remove the way to restore the TOC in r2 and the r2 TOC will
> + * get corrupted.
> + *
> + * Use a b +8 to jump over the load.
> + */
> + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
> pr_err("Patching NOP failed.\n");
> return -EPERM;
> }
> +#endif /* CONFIG_MPROFILE_KERNEL */
>
> return 0;
> }
> @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
> return -EPERM;
> }
>
> +#ifdef CONFIG_MPROFILE_KERNEL

I would think you need to break this up into two parts as well, with a
synchronize_rcu_tasks() in between.

Imagine this scenario:

<func>:
nop <-- interrupt comes here, and preempts the task
nop

First change.

<func>:
mflr r0
nop

Second change.

<func>:
mflr r0
bl _mcount

Task returns from interrupt

<func>:
mflr r0
bl _mcount <-- executes here

It never did the mflr r0, because the last command that was executed
was a nop before it was interrupted. And yes, it can be interrupted
on a nop!

-- Steve


> + /* Nop out the preceding 'mflr r0' as an optimization */
> + if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
> + pr_err("Fetching instruction at %lx failed.\n", ip - 4);
> + return -EFAULT;
> + }
> +
> + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
> + if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
> + return -EINVAL;
> + }
> +
> + if (op == PPC_INST_MFLR &&
> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
> + pr_err("Patching NOP failed.\n");
> + return -EPERM;
> + }
> +#endif
> +
> return 0;
> }
>
> @@ -429,6 +457,7 @@ int ftrace_make_nop(struct module *mod,
> {
> unsigned long ip = rec->ip;
> unsigned int old, new;
> + int rc;
>
> /*
> * If the calling address is more that 24 bits away,
> @@ -439,7 +468,34 @@ int ftrace_make_nop(struct module *mod,
> /* within range */
> old = ftrace_call_replace(ip, addr, 1);
> new = PPC_INST_NOP;
> - return ftrace_modify_code(ip, old, new);
> + rc = ftrace_modify_code(ip, old, new);
> +#ifdef CONFIG_MPROFILE_KERNEL
> + if (rc)
> + return rc;
> +
> + /*
> + * For -mprofile-kernel, we patch out the preceding 'mflr r0'
> + * instruction, as an optimization. It is important to nop out
> + * the branch to _mcount() first, as a lone 'mflr r0' is
> + * harmless.
> + */
> + if (probe_kernel_read(&old, (void *)(ip - 4), 4)) {
> + pr_err("Fetching instruction at %lx failed.\n", ip - 4);
> + return -EFAULT;
> + }
> +
> + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
> + if (old != PPC_INST_MFLR && old != PPC_INST_STD_LR) {
> + pr_err("Unexpected instruction %08x before bl _mcount\n",
> + old);
> + return -EINVAL;
> + }
> +
> + if (old == PPC_INST_MFLR)
> + rc = patch_instruction((unsigned int *)(ip - 4),
> + PPC_INST_NOP);
> +#endif
> + return rc;
> } else if (core_kernel_text(ip))
> return __ftrace_make_nop_kernel(rec, addr);
>
> @@ -567,6 +623,37 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> return -EINVAL;
> }
>
> +#ifdef CONFIG_MPROFILE_KERNEL
> + /*
> + * We could end up here without having called __ftrace_make_call_prep()
> + * if function tracing is enabled before a module is loaded.
> + *
> + * ftrace_module_enable() --> ftrace_replace_code_rec() -->
> + * ftrace_make_call() --> __ftrace_make_call()
> + *
> + * In this scenario, the previous instruction will be a NOP. It is
> + * safe to patch it with a 'mflr r0' since we know for a fact that
> + * this code is not yet being run.
> + */
> + ip -= MCOUNT_INSN_SIZE;
> +
> + /* read where this goes */
> + if (probe_kernel_read(op, ip, MCOUNT_INSN_SIZE))
> + return -EFAULT;
> +
> + /*
> + * nothing to do if this is using the older -mprofile-kernel
> + * instruction sequence
> + */
> + if (op[0] != PPC_INST_NOP)
> + return 0;
> +
> + if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) {
> + pr_err("Patching MFLR failed.\n");
> + return -EPERM;
> + }
> +#endif
> +
> return 0;
> }
>
> @@ -863,6 +950,133 @@ void arch_ftrace_update_code(int command)
> ftrace_modify_all_code(command);
> }
>
> +#ifdef CONFIG_MPROFILE_KERNEL
> +/* Returns 1 if we patched in the mflr */
> +static int __ftrace_make_call_prep(struct dyn_ftrace *rec)
> +{
> + void *ip = (void *)rec->ip - MCOUNT_INSN_SIZE;
> + unsigned int op[2];
> +
> + /* read where this goes */
> + if (probe_kernel_read(op, ip, sizeof(op)))
> + return -EFAULT;
> +
> + if (op[1] != PPC_INST_NOP) {
> + pr_err("Unexpected call sequence at %p: %x %x\n",
> + ip, op[0], op[1]);
> + return -EINVAL;
> + }
> +
> + /*
> + * nothing to do if this is using the older -mprofile-kernel
> + * instruction sequence
> + */
> + if (op[0] != PPC_INST_NOP)
> + return 0;
> +
> + if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) {
> + pr_err("Patching MFLR failed.\n");
> + return -EPERM;
> + }
> +
> + return 1;
> +}
> +
> +static void do_isync(void *info __maybe_unused)
> +{
> + isync();
> +}
> +
> +/*
> + * When enabling function tracing for -mprofile-kernel that uses a
> + * 2-instruction sequence of 'mflr r0, bl _mcount()', we use a 2 step process:
> + * 1. Patch in the 'mflr r0' instruction
> + * 1a. flush icache on all cpus, so that the updated instruction is seen
> + * 1b. synchronize_rcu_tasks() to ensure that any cpus that had executed
> + * the earlier nop there make progress (and hence do not branch into
> + * ftrace without executing the preceding mflr)
> + * 2. Patch in the branch to ftrace
> + */
> +void ftrace_replace_code(int mod_flags)
> +{
> + int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
> + int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL;
> + int ret, failed, make_call = 0;
> + struct ftrace_rec_iter *iter;
> + struct dyn_ftrace *rec;
> +
> + if (unlikely(!ftrace_enabled))
> + return;
> +
> + for_ftrace_rec_iter(iter) {
> + rec = ftrace_rec_iter_record(iter);
> +
> + if (rec->flags & FTRACE_FL_DISABLED)
> + continue;
> +
> + failed = 0;
> + ret = ftrace_test_record(rec, enable);
> + if (ret == FTRACE_UPDATE_MAKE_CALL) {
> + failed = __ftrace_make_call_prep(rec);
> + if (failed < 0) {
> + ftrace_bug(failed, rec);
> + return;
> + } else if (failed == 1)
> + make_call++;
> + }
> +
> + if (!failed) {
> + /* We can patch the record directly */
> + failed = ftrace_replace_code_rec(rec, enable);
> + if (failed) {
> + ftrace_bug(failed, rec);
> + return;
> + }
> + }
> +
> + if (schedulable)
> + cond_resched();
> + }
> +
> + if (!make_call)
> + /* No records needed patching a preceding mflr */
> + return;
> +
> + /* Make sure all cpus see the new instruction */
> + smp_call_function(do_isync, NULL, 1);
> +
> + /*
> + * We also need to ensure that all cpus make progress:
> + * - With !CONFIG_PREEMPT, we want to be sure that cpus return from
> + * any interrupts they may be handling, and make progress.
> + * - With CONFIG_PREEMPT, we want to be additionally sure that there
> + * are no pre-empted tasks that have executed the earlier nop, and
> + * might end up executing the subsequently patched branch to ftrace.
> + */
> + synchronize_rcu_tasks();
> +
> + for_ftrace_rec_iter(iter) {
> + rec = ftrace_rec_iter_record(iter);
> +
> + if (rec->flags & FTRACE_FL_DISABLED)
> + continue;
> +
> + ret = ftrace_test_record(rec, enable);
> + if (ret == FTRACE_UPDATE_MAKE_CALL)
> + failed = ftrace_replace_code_rec(rec, enable);
> +
> + if (failed) {
> + ftrace_bug(failed, rec);
> + return;
> + }
> +
> + if (schedulable)
> + cond_resched();
> + }
> +
> +}
> +#endif
> +
> #ifdef CONFIG_PPC64
> #define PACATOC offsetof(struct paca_struct, kernel_toc)
>

2019-06-27 15:29:52

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

Hi Steven,
Thanks for the review!

Steven Rostedt wrote:
> On Thu, 27 Jun 2019 16:53:52 +0530
> "Naveen N. Rao" <[email protected]> wrote:
>
>> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
>> enable function tracing and profiling. So far, with dynamic ftrace, we
>> used to only patch out the branch to _mcount(). However, mflr is
>> executed by the branch unit that can only execute one per cycle on
>> POWER9 and shared with branches, so it would be nice to avoid it where
>> possible.
>>
>> We cannot simply nop out the mflr either. When enabling function
>> tracing, there can be a race if tracing is enabled when some thread was
>> interrupted after executing a nop'ed out mflr. In this case, the thread
>> would execute the now-patched-in branch to _mcount() without having
>> executed the preceding mflr.
>>
>> To solve this, we now enable function tracing in 2 steps: patch in the
>> mflr instruction, use 'smp_call_function(isync);
>> synchronize_rcu_tasks()' to ensure all existing threads make progress,
>> and then patch in the branch to _mcount(). We override
>> ftrace_replace_code() with a powerpc64 variant for this purpose.
>
> You may want to explain that you do the reverse when patching it out.
> That is, patch out the "bl _mcount" into a nop and then patch out the
> "mflr r0".

Sure. I think we can add:
"When disabling function tracing, we can nop out the two instructions
without need for any synchronization in between, as long as we nop out
the branch to ftrace first. The lone 'mflr r0' is harmless. Finally,
with FTRACE_UPDATE_MODIFY_CALL, no changes are needed since we are
simply changing where the branch to ftrace goes."

> But interesting, I don't see a synchronize_rcu_tasks() call
> there.

We felt we don't need it in this case. We patch the branch to ftrace
with a nop first. Other cpus should see that first. But, now that I
think about it, should we add a memory barrier to ensure the writes get
ordered properly?

>
>
>>
>> Suggested-by: Nicholas Piggin <[email protected]>
>> Reviewed-by: Nicholas Piggin <[email protected]>
>> Signed-off-by: Naveen N. Rao <[email protected]>
>> ---
>> arch/powerpc/kernel/trace/ftrace.c | 258 ++++++++++++++++++++++++++---
>> 1 file changed, 236 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
>> index 517662a56bdc..86c2b50dcaa9 100644
>> --- a/arch/powerpc/kernel/trace/ftrace.c
>> +++ b/arch/powerpc/kernel/trace/ftrace.c
>> @@ -125,7 +125,7 @@ __ftrace_make_nop(struct module *mod,
>> {
>> unsigned long entry, ptr, tramp;
>> unsigned long ip = rec->ip;
>> - unsigned int op, pop;
>> + unsigned int op;
>>
>> /* read where this goes */
>> if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
>> @@ -160,8 +160,6 @@ __ftrace_make_nop(struct module *mod,
>>
>> #ifdef CONFIG_MPROFILE_KERNEL
>> /* When using -mkernel_profile there is no load to jump over */
>> - pop = PPC_INST_NOP;
>> -
>> if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
>> pr_err("Fetching instruction at %lx failed.\n", ip - 4);
>> return -EFAULT;
>> @@ -169,26 +167,23 @@ __ftrace_make_nop(struct module *mod,
>>
>> /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
>> if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
>> - pr_err("Unexpected instruction %08x around bl _mcount\n", op);
>> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
>> return -EINVAL;
>> }
>> -#else
>> - /*
>> - * Our original call site looks like:
>> - *
>> - * bl <tramp>
>> - * ld r2,XX(r1)
>> - *
>> - * Milton Miller pointed out that we can not simply nop the branch.
>> - * If a task was preempted when calling a trace function, the nops
>> - * will remove the way to restore the TOC in r2 and the r2 TOC will
>> - * get corrupted.
>> - *
>> - * Use a b +8 to jump over the load.
>> - */
>>
>> - pop = PPC_INST_BRANCH | 8; /* b +8 */
>> + /* We should patch out the bl to _mcount first */
>> + if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
>> + pr_err("Patching NOP failed.\n");
>> + return -EPERM;
>> + }
>>
>> + /* then, nop out the preceding 'mflr r0' as an optimization */
>> + if (op == PPC_INST_MFLR &&
>> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
>> + pr_err("Patching NOP failed.\n");
>> + return -EPERM;
>> + }
>> +#else
>> /*
>> * Check what is in the next instruction. We can see ld r2,40(r1), but
>> * on first pass after boot we will see mflr r0.
>> @@ -202,12 +197,25 @@ __ftrace_make_nop(struct module *mod,
>> pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
>> return -EINVAL;
>> }
>> -#endif /* CONFIG_MPROFILE_KERNEL */
>>
>> - if (patch_instruction((unsigned int *)ip, pop)) {
>> + /*
>> + * Our original call site looks like:
>> + *
>> + * bl <tramp>
>> + * ld r2,XX(r1)
>> + *
>> + * Milton Miller pointed out that we can not simply nop the branch.
>> + * If a task was preempted when calling a trace function, the nops
>> + * will remove the way to restore the TOC in r2 and the r2 TOC will
>> + * get corrupted.
>> + *
>> + * Use a b +8 to jump over the load.
>> + */
>> + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
>> pr_err("Patching NOP failed.\n");
>> return -EPERM;
>> }
>> +#endif /* CONFIG_MPROFILE_KERNEL */
>>
>> return 0;
>> }
>> @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
>> return -EPERM;
>> }
>>
>> +#ifdef CONFIG_MPROFILE_KERNEL
>
> I would think you need to break this up into two parts as well, with a
> synchronize_rcu_tasks() in between.
>
> Imagine this scenario:
>
> <func>:
> nop <-- interrupt comes here, and preempts the task
> nop
>
> First change.
>
> <func>:
> mflr r0
> nop
>
> Second change.
>
> <func>:
> mflr r0
> bl _mcount
>
> Task returns from interrupt
>
> <func>:
> mflr r0
> bl _mcount <-- executes here
>
> It never did the mflr r0, because the last command that was executed
> was a nop before it was interrupted. And yes, it can be interrupted
> on a nop!

We are handling this through ftrace_replace_code() and
__ftrace_make_call_prep() below. For FTRACE_UPDATE_MAKE_CALL, we patch
in the mflr, followed by smp_call_function(isync) and
synchronize_rcu_tasks() before we proceed to patch the branch to ftrace.

I don't see any other scenario where we end up in
__ftrace_make_nop_kernel() without going through ftrace_replace_code().
For kernel modules, this can happen during module load/init and so, I
patch out both instructions in __ftrace_make_call() above without any
synchronization.

Am I missing anything?


Thanks,
Naveen

>
> -- Steve
>
>
>> + /* Nop out the preceding 'mflr r0' as an optimization */
>> + if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
>> + pr_err("Fetching instruction at %lx failed.\n", ip - 4);
>> + return -EFAULT;
>> + }
>> +
>> + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
>> + if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
>> + pr_err("Unexpected instruction %08x before bl _mcount\n", op);
>> + return -EINVAL;
>> + }
>> +
>> + if (op == PPC_INST_MFLR &&
>> + patch_instruction((unsigned int *)(ip - 4), PPC_INST_NOP)) {
>> + pr_err("Patching NOP failed.\n");
>> + return -EPERM;
>> + }
>> +#endif
>> +
>> return 0;
>> }
>>
>> @@ -429,6 +457,7 @@ int ftrace_make_nop(struct module *mod,
>> {
>> unsigned long ip = rec->ip;
>> unsigned int old, new;
>> + int rc;
>>
>> /*
>> * If the calling address is more that 24 bits away,
>> @@ -439,7 +468,34 @@ int ftrace_make_nop(struct module *mod,
>> /* within range */
>> old = ftrace_call_replace(ip, addr, 1);
>> new = PPC_INST_NOP;
>> - return ftrace_modify_code(ip, old, new);
>> + rc = ftrace_modify_code(ip, old, new);
>> +#ifdef CONFIG_MPROFILE_KERNEL
>> + if (rc)
>> + return rc;
>> +
>> + /*
>> + * For -mprofile-kernel, we patch out the preceding 'mflr r0'
>> + * instruction, as an optimization. It is important to nop out
>> + * the branch to _mcount() first, as a lone 'mflr r0' is
>> + * harmless.
>> + */
>> + if (probe_kernel_read(&old, (void *)(ip - 4), 4)) {
>> + pr_err("Fetching instruction at %lx failed.\n", ip - 4);
>> + return -EFAULT;
>> + }
>> +
>> + /* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
>> + if (old != PPC_INST_MFLR && old != PPC_INST_STD_LR) {
>> + pr_err("Unexpected instruction %08x before bl _mcount\n",
>> + old);
>> + return -EINVAL;
>> + }
>> +
>> + if (old == PPC_INST_MFLR)
>> + rc = patch_instruction((unsigned int *)(ip - 4),
>> + PPC_INST_NOP);
>> +#endif
>> + return rc;
>> } else if (core_kernel_text(ip))
>> return __ftrace_make_nop_kernel(rec, addr);
>>
>> @@ -567,6 +623,37 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
>> return -EINVAL;
>> }
>>
>> +#ifdef CONFIG_MPROFILE_KERNEL
>> + /*
>> + * We could end up here without having called __ftrace_make_call_prep()
>> + * if function tracing is enabled before a module is loaded.
>> + *
>> + * ftrace_module_enable() --> ftrace_replace_code_rec() -->
>> + * ftrace_make_call() --> __ftrace_make_call()
>> + *
>> + * In this scenario, the previous instruction will be a NOP. It is
>> + * safe to patch it with a 'mflr r0' since we know for a fact that
>> + * this code is not yet being run.
>> + */
>> + ip -= MCOUNT_INSN_SIZE;
>> +
>> + /* read where this goes */
>> + if (probe_kernel_read(op, ip, MCOUNT_INSN_SIZE))
>> + return -EFAULT;
>> +
>> + /*
>> + * nothing to do if this is using the older -mprofile-kernel
>> + * instruction sequence
>> + */
>> + if (op[0] != PPC_INST_NOP)
>> + return 0;
>> +
>> + if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) {
>> + pr_err("Patching MFLR failed.\n");
>> + return -EPERM;
>> + }
>> +#endif
>> +
>> return 0;
>> }
>>
>> @@ -863,6 +950,133 @@ void arch_ftrace_update_code(int command)
>> ftrace_modify_all_code(command);
>> }
>>
>> +#ifdef CONFIG_MPROFILE_KERNEL
>> +/* Returns 1 if we patched in the mflr */
>> +static int __ftrace_make_call_prep(struct dyn_ftrace *rec)
>> +{
>> + void *ip = (void *)rec->ip - MCOUNT_INSN_SIZE;
>> + unsigned int op[2];
>> +
>> + /* read where this goes */
>> + if (probe_kernel_read(op, ip, sizeof(op)))
>> + return -EFAULT;
>> +
>> + if (op[1] != PPC_INST_NOP) {
>> + pr_err("Unexpected call sequence at %p: %x %x\n",
>> + ip, op[0], op[1]);
>> + return -EINVAL;
>> + }
>> +
>> + /*
>> + * nothing to do if this is using the older -mprofile-kernel
>> + * instruction sequence
>> + */
>> + if (op[0] != PPC_INST_NOP)
>> + return 0;
>> +
>> + if (patch_instruction((unsigned int *)ip, PPC_INST_MFLR)) {
>> + pr_err("Patching MFLR failed.\n");
>> + return -EPERM;
>> + }
>> +
>> + return 1;
>> +}
>> +
>> +static void do_isync(void *info __maybe_unused)
>> +{
>> + isync();
>> +}
>> +
>> +/*
>> + * When enabling function tracing for -mprofile-kernel that uses a
>> + * 2-instruction sequence of 'mflr r0, bl _mcount()', we use a 2 step process:
>> + * 1. Patch in the 'mflr r0' instruction
>> + * 1a. flush icache on all cpus, so that the updated instruction is seen
>> + * 1b. synchronize_rcu_tasks() to ensure that any cpus that had executed
>> + * the earlier nop there make progress (and hence do not branch into
>> + * ftrace without executing the preceding mflr)
>> + * 2. Patch in the branch to ftrace
>> + */
>> +void ftrace_replace_code(int mod_flags)
>> +{
>> + int enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
>> + int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL;
>> + int ret, failed, make_call = 0;
>> + struct ftrace_rec_iter *iter;
>> + struct dyn_ftrace *rec;
>> +
>> + if (unlikely(!ftrace_enabled))
>> + return;
>> +
>> + for_ftrace_rec_iter(iter) {
>> + rec = ftrace_rec_iter_record(iter);
>> +
>> + if (rec->flags & FTRACE_FL_DISABLED)
>> + continue;
>> +
>> + failed = 0;
>> + ret = ftrace_test_record(rec, enable);
>> + if (ret == FTRACE_UPDATE_MAKE_CALL) {
>> + failed = __ftrace_make_call_prep(rec);
>> + if (failed < 0) {
>> + ftrace_bug(failed, rec);
>> + return;
>> + } else if (failed == 1)
>> + make_call++;
>> + }
>> +
>> + if (!failed) {
>> + /* We can patch the record directly */
>> + failed = ftrace_replace_code_rec(rec, enable);
>> + if (failed) {
>> + ftrace_bug(failed, rec);
>> + return;
>> + }
>> + }
>> +
>> + if (schedulable)
>> + cond_resched();
>> + }
>> +
>> + if (!make_call)
>> + /* No records needed patching a preceding mflr */
>> + return;
>> +
>> + /* Make sure all cpus see the new instruction */
>> + smp_call_function(do_isync, NULL, 1);
>> +
>> + /*
>> + * We also need to ensure that all cpus make progress:
>> + * - With !CONFIG_PREEMPT, we want to be sure that cpus return from
>> + * any interrupts they may be handling, and make progress.
>> + * - With CONFIG_PREEMPT, we want to be additionally sure that there
>> + * are no pre-empted tasks that have executed the earlier nop, and
>> + * might end up executing the subsequently patched branch to ftrace.
>> + */
>> + synchronize_rcu_tasks();
>> +
>> + for_ftrace_rec_iter(iter) {
>> + rec = ftrace_rec_iter_record(iter);
>> +
>> + if (rec->flags & FTRACE_FL_DISABLED)
>> + continue;
>> +
>> + ret = ftrace_test_record(rec, enable);
>> + if (ret == FTRACE_UPDATE_MAKE_CALL)
>> + failed = ftrace_replace_code_rec(rec, enable);
>> +
>> + if (failed) {
>> + ftrace_bug(failed, rec);
>> + return;
>> + }
>> +
>> + if (schedulable)
>> + cond_resched();
>> + }
>> +
>> +}
>> +#endif
>> +
>> #ifdef CONFIG_PPC64
>> #define PACATOC offsetof(struct paca_struct, kernel_toc)
>>
>
>

2019-06-27 16:14:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

On Thu, 27 Jun 2019 20:58:20 +0530
"Naveen N. Rao" <[email protected]> wrote:

>
> > But interesting, I don't see a synchronize_rcu_tasks() call
> > there.
>
> We felt we don't need it in this case. We patch the branch to ftrace
> with a nop first. Other cpus should see that first. But, now that I
> think about it, should we add a memory barrier to ensure the writes get
> ordered properly?

Do you send an ipi to the other CPUs. I would just to be safe.

> >> - if (patch_instruction((unsigned int *)ip, pop)) {
> >> + /*
> >> + * Our original call site looks like:
> >> + *
> >> + * bl <tramp>
> >> + * ld r2,XX(r1)
> >> + *
> >> + * Milton Miller pointed out that we can not simply nop the branch.
> >> + * If a task was preempted when calling a trace function, the nops
> >> + * will remove the way to restore the TOC in r2 and the r2 TOC will
> >> + * get corrupted.
> >> + *
> >> + * Use a b +8 to jump over the load.
> >> + */
> >> + if (patch_instruction((unsigned int *)ip, PPC_INST_BRANCH | 8)) {
> >> pr_err("Patching NOP failed.\n");
> >> return -EPERM;
> >> }
> >> +#endif /* CONFIG_MPROFILE_KERNEL */
> >>
> >> return 0;
> >> }
> >> @@ -421,6 +429,26 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace *rec, unsigned long addr)
> >> return -EPERM;
> >> }
> >>
> >> +#ifdef CONFIG_MPROFILE_KERNEL
> >
> > I would think you need to break this up into two parts as well, with a
> > synchronize_rcu_tasks() in between.
> >
> > Imagine this scenario:
> >
> > <func>:
> > nop <-- interrupt comes here, and preempts the task
> > nop
> >
> > First change.
> >
> > <func>:
> > mflr r0
> > nop
> >
> > Second change.
> >
> > <func>:
> > mflr r0
> > bl _mcount
> >
> > Task returns from interrupt
> >
> > <func>:
> > mflr r0
> > bl _mcount <-- executes here
> >
> > It never did the mflr r0, because the last command that was executed
> > was a nop before it was interrupted. And yes, it can be interrupted
> > on a nop!
>
> We are handling this through ftrace_replace_code() and
> __ftrace_make_call_prep() below. For FTRACE_UPDATE_MAKE_CALL, we patch
> in the mflr, followed by smp_call_function(isync) and
> synchronize_rcu_tasks() before we proceed to patch the branch to ftrace.
>
> I don't see any other scenario where we end up in
> __ftrace_make_nop_kernel() without going through ftrace_replace_code().
> For kernel modules, this can happen during module load/init and so, I
> patch out both instructions in __ftrace_make_call() above without any
> synchronization.
>
> Am I missing anything?
>

No, I think I got confused ;-), it's the patch out that I was worried
about, but when I was going through the scenario, I somehow turned it
into the patching in (which I already audited :-p). I was going to
reply with just the top part of this email, but then the confusion
started :-/

OK, yes, patching out should be fine, and you already covered the
patching in. Sorry for the noise.

Just to confirm and totally remove the confusion, the patch does:

<func>:
mflr r0 <-- preempt here
bl _mcount

<func>:
mflr r0
nop

And this is fine regardless.

OK, Reviewed-by: Steven Rostedt (VMware) <[email protected]>

-- Steve

2019-06-28 07:02:17

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

Naveen N. Rao's on June 27, 2019 9:23 pm:
> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
> enable function tracing and profiling. So far, with dynamic ftrace, we
> used to only patch out the branch to _mcount(). However, mflr is
> executed by the branch unit that can only execute one per cycle on
> POWER9 and shared with branches, so it would be nice to avoid it where
> possible.
>
> We cannot simply nop out the mflr either. When enabling function
> tracing, there can be a race if tracing is enabled when some thread was
> interrupted after executing a nop'ed out mflr. In this case, the thread
> would execute the now-patched-in branch to _mcount() without having
> executed the preceding mflr.
>
> To solve this, we now enable function tracing in 2 steps: patch in the
> mflr instruction, use 'smp_call_function(isync);
> synchronize_rcu_tasks()' to ensure all existing threads make progress,
> and then patch in the branch to _mcount(). We override
> ftrace_replace_code() with a powerpc64 variant for this purpose.

I think this seems like a reasonable sequence that will work on our
hardware, although technically outside the ISA as specified maybe
we should add a feature bit or at least comment for it.

It would be kind of nice to not put this stuff directly in the
ftrace code, but rather in the function patching subsystem.

I think it would be too expensive to just make a runtime variant of
patch_instruction that always does the SMP isync, but possibly a
patch_instruction_sync() or something that we say ensures no
processor is running code that has been patched away.

Thanks,
Nick

2019-07-01 08:51:57

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

Steven Rostedt wrote:
> On Thu, 27 Jun 2019 20:58:20 +0530
> "Naveen N. Rao" <[email protected]> wrote:
>
>>
>> > But interesting, I don't see a synchronize_rcu_tasks() call
>> > there.
>>
>> We felt we don't need it in this case. We patch the branch to ftrace
>> with a nop first. Other cpus should see that first. But, now that I
>> think about it, should we add a memory barrier to ensure the writes get
>> ordered properly?
>
> Do you send an ipi to the other CPUs. I would just to be safe.
>

<snip>

>>
>> We are handling this through ftrace_replace_code() and
>> __ftrace_make_call_prep() below. For FTRACE_UPDATE_MAKE_CALL, we patch
>> in the mflr, followed by smp_call_function(isync) and
>> synchronize_rcu_tasks() before we proceed to patch the branch to ftrace.
>>
>> I don't see any other scenario where we end up in
>> __ftrace_make_nop_kernel() without going through ftrace_replace_code().
>> For kernel modules, this can happen during module load/init and so, I
>> patch out both instructions in __ftrace_make_call() above without any
>> synchronization.
>>
>> Am I missing anything?
>>
>
> No, I think I got confused ;-), it's the patch out that I was worried
> about, but when I was going through the scenario, I somehow turned it
> into the patching in (which I already audited :-p). I was going to
> reply with just the top part of this email, but then the confusion
> started :-/
>
> OK, yes, patching out should be fine, and you already covered the
> patching in. Sorry for the noise.
>
> Just to confirm and totally remove the confusion, the patch does:
>
> <func>:
> mflr r0 <-- preempt here
> bl _mcount
>
> <func>:
> mflr r0
> nop
>
> And this is fine regardless.
>
> OK, Reviewed-by: Steven Rostedt (VMware) <[email protected]>

Thanks for confirming! We do need an IPI to be sure, as you pointed out
above. I will have the patching out take the same path to simplify
things.


- Naveen


2022-02-24 01:40:59

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] ftrace: Expose flags used for ftrace_replace_code()



Le 27/06/2019 à 13:23, Naveen N. Rao a écrit :
> Since ftrace_replace_code() is a __weak function and can be overridden,
> we need to expose the flags that can be set. So, move the flags enum to
> the header file.
>
> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
> Signed-off-by: Naveen N. Rao <[email protected]>

This series does apply anymore.

We have a link to it in https://github.com/linuxppc/issues/issues/386

I'll flag it "change requested"

Christophe

2022-03-02 23:43:57

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] ftrace: Expose flags used for ftrace_replace_code()

Christophe Leroy wrote:
>
>
> Le 27/06/2019 à 13:23, Naveen N. Rao a écrit :
>> Since ftrace_replace_code() is a __weak function and can be overridden,
>> we need to expose the flags that can be set. So, move the flags enum to
>> the header file.
>>
>> Reviewed-by: Steven Rostedt (VMware) <[email protected]>
>> Signed-off-by: Naveen N. Rao <[email protected]>
>
> This series does apply anymore.
>
> We have a link to it in https://github.com/linuxppc/issues/issues/386
>
> I'll flag it "change requested"

There are a couple of changes being worked on as a prerequisite for this
series. See:
http://lkml.kernel.org/r/[email protected]


- Naveen