2009-09-22 19:38:28

by Frederic Weisbecker

[permalink] [raw]
Subject: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates

Hi Ingo,

Kprobes has been nicely improved lately.
The x86 instruction decoder has been fixed to support cross builds and
mmx instruction set, besides of a lot of various kprobes core fixes.

The tracing part has evolved too, we can define human names for
arguments and custom subsystem names for dynamic tracepoints.

And also kprobes profiling and raw dynamic tracepoint samples are now
supported through perf. Looks like most of the kernel parts are now
in place for a perf support. Things are going to be focused on a
perf kprobes tool to exploit that.

Concerning this git tree, based on tip:/tracing/kprobes, I had to merge
tracing/core inside few weeks ago because it needed build fixes that were
in tracing/core (the merge commit provides the details). The tree is self
contained but it's already async with recent upstream tracing updates.
It means that merging upstream tree or tracing/core inside may result in
non-trivial conflicts. I can handle them, or rebase the whole, as you prefer.

The tree can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
tracing/kprobes

Thanks,
Frederic.

---
Ananth N Mavinakayanahalli (1):
kprobes: Prevent re-registration of the same kprobe

Frederic Weisbecker (1):
Merge commit 'tracing/core' into tracing/kprobes

Masami Hiramatsu (23):
kprobes/x86: Call BUG() when reentering probe into KPROBES_HIT_SS
kprobes/x86-64: Allow to reenter probe on post_handler
kprobes/x86: Fix to add __kprobes to in-kernel fault handing functions
kprobes: Fix to add __kprobes to notify_die
kprobes/x86-64: Fix to move common_interrupt to .kprobes.text
kprobes: Prohibit to probe native_get_debugreg
x86: Allow x86-32 instruction decoder selftest on x86-64
x86: Remove unused config macros from instruction decoder selftest
x86: Add MMX support for instruction decoder
kprobes/x86-32: Move irq-exit functions to kprobes section
x86/ptrace: Fix regs_get_argument_nth() to add correct offset
tracing/kprobes: Fix probe offset to be unsigned
tracing/kprobes: Cleanup kprobe tracer code.
tracing/kprobes: Add event profiling support
tracing/kprobes: Add argument name support
tracing/kprobes: Show event name in trace output
tracing/kprobes: Support custom subsystem for each kprobe event
tracing/kprobes: Fix trace_probe registration order
ftrace: Fix trace_add_event_call() to initialize list
ftrace: Fix trace_remove_event_call() to lock trace_event_mutex
tracing/kprobes: Add probe handler dispatcher to support perf and ftrace concurrent use
tracing/kprobes: Fix profiling alignment for perf_counter buffer
tracing/kprobes: Disable kprobe events by default after creation


The diffstat looks erroneous. I got it by taking
tip:tracing/kprobes...HEAD but excluding the tracing/core changes
after the merge commit: ^d28daf923ac5e4a0d7cecebae56f3e339189366b.
And it seems it has taken the diffstat from the beginning of kprobes
tracing development. Anyway, here it is:

Documentation/trace/kprobetrace.txt | 152 ++++
arch/x86/Kconfig.debug | 9 +
arch/x86/Makefile | 3 +
arch/x86/include/asm/inat.h | 188 +++++
arch/x86/include/asm/inat_types.h | 29 +
arch/x86/include/asm/insn.h | 143 ++++
arch/x86/include/asm/ptrace.h | 62 ++
arch/x86/kernel/entry_32.S | 24 +
arch/x86/kernel/entry_64.S | 8 +
arch/x86/kernel/kprobes.c | 234 +++----
arch/x86/kernel/ptrace.c | 112 +++
arch/x86/lib/Makefile | 13 +
arch/x86/lib/inat.c | 78 ++
arch/x86/lib/insn.c | 464 +++++++++++
arch/x86/lib/x86-opcode-map.txt | 812 ++++++++++++++++++++
arch/x86/mm/fault.c | 11 +-
arch/x86/tools/Makefile | 15 +
arch/x86/tools/distill.awk | 42 +
arch/x86/tools/gen-insn-attr-x86.awk | 334 ++++++++
arch/x86/tools/test_get_len.c | 108 +++
include/linux/ftrace_event.h | 19 +-
include/linux/kprobes.h | 2 +
include/linux/syscalls.h | 4 +-
include/trace/ftrace.h | 16 +-
include/trace/syscall.h | 11 +-
kernel/kprobes.c | 68 ++-
kernel/notifier.c | 2 +-
kernel/trace/Kconfig | 12 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 24 +
kernel/trace/trace_event_types.h | 4 +-
kernel/trace/trace_events.c | 125 +++-
kernel/trace/trace_export.c | 31 +-
kernel/trace/trace_kprobe.c | 1392 ++++++++++++++++++++++++++++++++++
kernel/trace/trace_syscalls.c | 16 +-
35 files changed, 4331 insertions(+), 237 deletions(-)


2009-09-22 19:38:33

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 01/24] kprobes/x86: Call BUG() when reentering probe into KPROBES_HIT_SS

From: Masami Hiramatsu <[email protected]>

Call BUG() when a probe have been hit on the way of kprobe processing
path, because that kind of probes are currently unrecoverable
(recovering it will cause an infinite loop and stack overflow).

The original code seems to assume that it's caused by an int3
which another subsystem inserted on out-of-line singlestep buffer if
the hitting probe is same as current probe. However, in that case,
int3-hitting-address is on the out-of-line buffer and should be
different from first (current) int3 address.
Thus, I decided to remove the code.

I also removes arch_disarm_kprobe() because it will involve other stuffs
in text_poke().

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/kprobes.c | 26 ++++++++++----------------
1 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index ecee3d2..e0fb615 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -482,22 +482,16 @@ static int __kprobes reenter_kprobe(struct kprobe *p, struct pt_regs *regs,
kcb->kprobe_status = KPROBE_REENTER;
break;
case KPROBE_HIT_SS:
- if (p == kprobe_running()) {
- regs->flags &= ~X86_EFLAGS_TF;
- regs->flags |= kcb->kprobe_saved_flags;
- return 0;
- } else {
- /* A probe has been hit in the codepath leading up
- * to, or just after, single-stepping of a probed
- * instruction. This entire codepath should strictly
- * reside in .kprobes.text section.
- * Raise a BUG or we'll continue in an endless
- * reentering loop and eventually a stack overflow.
- */
- arch_disarm_kprobe(p);
- dump_kprobe(p);
- BUG();
- }
+ /* A probe has been hit in the codepath leading up to, or just
+ * after, single-stepping of a probed instruction. This entire
+ * codepath should strictly reside in .kprobes.text section.
+ * Raise a BUG or we'll continue in an endless reentering loop
+ * and eventually a stack overflow.
+ */
+ printk(KERN_WARNING "Unrecoverable kprobe detected at %p.\n",
+ p->addr);
+ dump_kprobe(p);
+ BUG();
default:
/* impossible cases */
WARN_ON(1);
--
1.6.2.3

2009-09-22 19:42:51

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 02/24] kprobes/x86-64: Allow to reenter probe on post_handler

From: Masami Hiramatsu <[email protected]>

Allow to reenter probe on the post_handler of another probe on x86-64,
because x86-64 already allows reentering int3.
In that case, reentered probe just increases kp.nmissed and returns.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/kprobes.c | 11 -----------
1 files changed, 0 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index e0fb615..c5f1f11 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -463,17 +463,6 @@ static int __kprobes reenter_kprobe(struct kprobe *p, struct pt_regs *regs,
{
switch (kcb->kprobe_status) {
case KPROBE_HIT_SSDONE:
-#ifdef CONFIG_X86_64
- /* TODO: Provide re-entrancy from post_kprobes_handler() and
- * avoid exception stack corruption while single-stepping on
- * the instruction of the new probe.
- */
- arch_disarm_kprobe(p);
- regs->ip = (unsigned long)p->addr;
- reset_current_kprobe();
- preempt_enable_no_resched();
- break;
-#endif
case KPROBE_HIT_ACTIVE:
save_previous_kprobe(kcb);
set_current_kprobe(p, regs, kcb);
--
1.6.2.3

2009-09-22 19:42:54

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 03/24] kprobes/x86: Fix to add __kprobes to in-kernel fault handing functions

From: Masami Hiramatsu <[email protected]>

Add __kprobes to the functions which handle in-kernel fixable page
faults. Since kprobes can cause those in-kernel page faults by accessing
kprobe data structures, probing those fault functions will cause
fault-int3-loop (do_page_fault has already been marked as __kprobes).

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/mm/fault.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index bfae139..c322e59 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -38,7 +38,8 @@ enum x86_pf_error_code {
* Returns 0 if mmiotrace is disabled, or if the fault is not
* handled by mmiotrace:
*/
-static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr)
+static inline int __kprobes
+kmmio_fault(struct pt_regs *regs, unsigned long addr)
{
if (unlikely(is_kmmio_active()))
if (kmmio_handler(regs, addr) == 1)
@@ -46,7 +47,7 @@ static inline int kmmio_fault(struct pt_regs *regs, unsigned long addr)
return 0;
}

-static inline int notify_page_fault(struct pt_regs *regs)
+static inline int __kprobes notify_page_fault(struct pt_regs *regs)
{
int ret = 0;

@@ -239,7 +240,7 @@ void vmalloc_sync_all(void)
*
* Handle a fault on the vmalloc or module mapping area
*/
-static noinline int vmalloc_fault(unsigned long address)
+static noinline __kprobes int vmalloc_fault(unsigned long address)
{
unsigned long pgd_paddr;
pmd_t *pmd_k;
@@ -361,7 +362,7 @@ void vmalloc_sync_all(void)
*
* This assumes no large pages in there.
*/
-static noinline int vmalloc_fault(unsigned long address)
+static noinline __kprobes int vmalloc_fault(unsigned long address)
{
pgd_t *pgd, *pgd_ref;
pud_t *pud, *pud_ref;
@@ -858,7 +859,7 @@ static int spurious_fault_check(unsigned long error_code, pte_t *pte)
* There are no security implications to leaving a stale TLB when
* increasing the permissions on a page.
*/
-static noinline int
+static noinline __kprobes int
spurious_fault(unsigned long error_code, unsigned long address)
{
pgd_t *pgd;
--
1.6.2.3

2009-09-22 19:38:44

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 04/24] kprobes: Fix to add __kprobes to notify_die

From: Masami Hiramatsu <[email protected]>

Add __kprobes to notify_die() because do_int3() calls notify_die()
instead of atomic_notify_call_chain() which is already marked as
__kprobes.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/notifier.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/notifier.c b/kernel/notifier.c
index 61d5aa5..acd24e7 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -558,7 +558,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier);

static ATOMIC_NOTIFIER_HEAD(die_chain);

-int notrace notify_die(enum die_val val, const char *str,
+int notrace __kprobes notify_die(enum die_val val, const char *str,
struct pt_regs *regs, long err, int trap, int sig)
{
struct die_args args = {
--
1.6.2.3

2009-09-22 19:38:40

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 05/24] kprobes/x86-64: Fix to move common_interrupt to .kprobes.text

From: Masami Hiramatsu <[email protected]>

Since nmi, debug and int3 returns to irq_return inside common_interrupt,
probing this function will cause int3-loop, so it should be marked
as __kprobes.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/entry_64.S | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index c251be7..36e2ef5 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -809,6 +809,10 @@ END(interrupt)
call \func
.endm

+/*
+ * Interrupt entry/exit should be protected against kprobes
+ */
+ .pushsection .kprobes.text, "ax"
/*
* The interrupt stubs push (~vector+0x80) onto the stack and
* then jump to common_interrupt.
@@ -947,6 +951,10 @@ ENTRY(retint_kernel)

CFI_ENDPROC
END(common_interrupt)
+/*
+ * End of kprobes section
+ */
+ .popsection

/*
* APIC interrupts.
--
1.6.2.3

2009-09-22 19:42:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 06/24] kprobes: Prohibit to probe native_get_debugreg

From: Masami Hiramatsu <[email protected]>

Since do_debug() calls get_debugreg(), native_get_debugreg() will be
called from singlestepping. This can cause an int3 infinite loop.

We can't put it in the .text.kprobes section because it is inlined,
then we blacklist its name.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/kprobes.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index f72e96c..3267d90 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -90,6 +90,7 @@ static spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
*/
static struct kprobe_blackpoint kprobe_blacklist[] = {
{"preempt_schedule",},
+ {"native_get_debugreg",},
{NULL} /* Terminator */
};

--
1.6.2.3

2009-09-22 19:42:16

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 07/24] x86: Allow x86-32 instruction decoder selftest on x86-64

From: Masami Hiramatsu <[email protected]>

Pass $(CONFIG_64BIT) to the x86 insn decoder selftest in case we are
decoding 32bit code on x86-64, which will happen when building kernel
with ARCH=i386 on x86-64.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/tools/Makefile | 2 +-
arch/x86/tools/test_get_len.c | 14 +++++++-------
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/tools/Makefile b/arch/x86/tools/Makefile
index 95e9cc4..1bd006c 100644
--- a/arch/x86/tools/Makefile
+++ b/arch/x86/tools/Makefile
@@ -1,6 +1,6 @@
PHONY += posttest
quiet_cmd_posttest = TEST $@
- cmd_posttest = $(OBJDUMP) -d -j .text $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len
+ cmd_posttest = $(OBJDUMP) -d -j .text $(objtree)/vmlinux | awk -f $(srctree)/arch/x86/tools/distill.awk | $(obj)/test_get_len $(CONFIG_64BIT)

posttest: $(obj)/test_get_len vmlinux
$(call cmd,posttest)
diff --git a/arch/x86/tools/test_get_len.c b/arch/x86/tools/test_get_len.c
index 1e81adb..a3273f4 100644
--- a/arch/x86/tools/test_get_len.c
+++ b/arch/x86/tools/test_get_len.c
@@ -45,7 +45,7 @@ const char *prog;
static void usage(void)
{
fprintf(stderr, "Usage: objdump -d a.out | awk -f distill.awk |"
- " ./test_get_len\n");
+ " %s [y|n](64bit flag)\n", prog);
exit(1);
}

@@ -63,11 +63,15 @@ int main(int argc, char **argv)
unsigned char insn_buf[16];
struct insn insn;
int insns = 0;
+ int x86_64 = 0;

prog = argv[0];
- if (argc > 1)
+ if (argc > 2)
usage();

+ if (argc == 2 && argv[1][0] == 'y')
+ x86_64 = 1;
+
while (fgets(line, BUFSIZE, stdin)) {
char copy[BUFSIZE], *s, *tab1, *tab2;
int nb = 0;
@@ -93,11 +97,7 @@ int main(int argc, char **argv)
break;
}
/* Decode an instruction */
-#ifdef __x86_64__
- insn_init(&insn, insn_buf, 1);
-#else
- insn_init(&insn, insn_buf, 0);
-#endif
+ insn_init(&insn, insn_buf, x86_64);
insn_get_length(&insn);
if (insn.length != nb) {
fprintf(stderr, "Error: %s", line);
--
1.6.2.3

2009-09-22 19:38:48

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 08/24] x86: Remove unused config macros from instruction decoder selftest

From: Masami Hiramatsu <[email protected]>

Remove dummy definitions of CONFIG_X86_64 and CONFIG_X86_32 because
those macros are not used in the instruction decoder anymore.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/tools/test_get_len.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/x86/tools/test_get_len.c b/arch/x86/tools/test_get_len.c
index a3273f4..376d338 100644
--- a/arch/x86/tools/test_get_len.c
+++ b/arch/x86/tools/test_get_len.c
@@ -21,11 +21,6 @@
#include <string.h>
#include <assert.h>

-#ifdef __x86_64__
-#define CONFIG_X86_64
-#else
-#define CONFIG_X86_32
-#endif
#define unlikely(cond) (cond)

#include <asm/insn.h>
--
1.6.2.3

2009-09-22 19:38:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 09/24] x86: Add MMX support for instruction decoder

From: Masami Hiramatsu <[email protected]>

Add MMX/SSE instructions to x86 opcode maps, since some of those
instructions are used in the kernel.

This also fixes failures in the x86 instruction decoder seftest.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Sam Ravnborg <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 307 +++++++++++++++++++++++++--------------
1 files changed, 200 insertions(+), 107 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 083dd59..59e20d5 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -310,14 +310,14 @@ Referrer: 2-byte escape
0e:
0f:
# 0x0f 0x10-0x1f
-10:
-11:
-12:
-13:
-14:
-15:
-16:
-17:
+10: movups Vps,Wps | movss Vss,Wss (F3) | movupd Vpd,Wpd (66) | movsd Vsd,Wsd (F2)
+11: movups Wps,Vps | movss Wss,Vss (F3) | movupd Wpd,Vpd (66) | movsd Wsd,Vsd (F2)
+12: movlps Vq,Mq | movlpd Vq,Mq (66) | movhlps Vq,Uq | movddup Vq,Wq (F2) | movsldup Vq,Wq (F3)
+13: mpvlps Mq,Vq | movlpd Mq,Vq (66)
+14: unpcklps Vps,Wq | unpcklpd Vpd,Wq (66)
+15: unpckhps Vps,Wq | unpckhpd Vpd,Wq (66)
+16: movhps Vq,Mq | movhpd Vq,Mq (66) | movlsps Vq,Uq | movshdup Vq,Wq (F3)
+17: movhps Mq,Vq | movhpd Mq,Vq (66)
18: Grp16 (1A)
19:
1a:
@@ -337,12 +337,12 @@ Referrer: 2-byte escape
27:
28: movaps Vps,Wps | movapd Vpd,Wpd (66)
29: movaps Wps,Vps | movapd Wpd,Vpd (66)
-2a:
-2b:
-2c:
-2d:
-2e:
-2f:
+2a: cvtpi2ps Vps,Qpi | cvtsi2ss Vss,Ed/q (F3) | cvtpi2pd Vpd,Qpi (66) | cvtsi2sd Vsd,Ed/q (F2)
+2b: movntps Mps,Vps | movntpd Mpd,Vpd (66)
+2c: cvttps2pi Ppi,Wps | cvttss2si Gd/q,Wss (F3) | cvttpd2pi Ppi,Wpd (66) | cvttsd2si Gd/q,Wsd (F2)
+2d: cvtps2pi Ppi,Wps | cvtss2si Gd/q,Wss (F3) | cvtpd2pi Qpi,Wpd (66) | cvtsd2si Gd/q,Wsd (F2)
+2e: ucomiss Vss,Wss | ucomisd Vsd,Wsd (66)
+2f: comiss Vss,Wss | comisd Vsd,Wsd (66)
# 0x0f 0x30-0x3f
30: WRMSR
31: RDTSC
@@ -378,56 +378,56 @@ Referrer: 2-byte escape
4e: CMOVLE/NG Gv,Ev
4f: CMOVNLE/G Gv,Ev
# 0x0f 0x50-0x5f
-50:
-51:
-52:
-53:
-54:
-55:
-56:
-57:
-58:
-59:
-5a:
-5b:
-5c:
-5d:
-5e:
-5f:
+50: movmskps Gd/q,Ups | movmskpd Gd/q,Upd (66)
+51: sqrtps Vps,Wps | sqrtss Vss,Wss (F3) | sqrtpd Vpd,Wpd (66) | sqrtsd Vsd,Wsd (F2)
+52: rsqrtps Vps,Wps | rsqrtss Vss,Wss (F3)
+53: rcpps Vps,Wps | rcpss Vss,Wss (F3)
+54: andps Vps,Wps | andpd Vpd,Wpd (66)
+55: andnps Vps,Wps | andnpd Vpd,Wpd (66)
+56: orps Vps,Wps | orpd Vpd,Wpd (66)
+57: xorps Vps,Wps | xorpd Vpd,Wpd (66)
+58: addps Vps,Wps | addss Vss,Wss (F3) | addpd Vpd,Wpd (66) | addsd Vsd,Wsd (F2)
+59: mulps Vps,Wps | mulss Vss,Wss (F3) | mulpd Vpd,Wpd (66) | mulsd Vsd,Wsd (F2)
+5a: cvtps2pd Vpd,Wps | cvtss2sd Vsd,Wss (F3) | cvtpd2ps Vps,Wpd (66) | cvtsd2ss Vsd,Wsd (F2)
+5b: cvtdq2ps Vps,Wdq | cvtps2dq Vdq,Wps (66) | cvttps2dq Vdq,Wps (F3)
+5c: subps Vps,Wps | subss Vss,Wss (F3) | subpd Vpd,Wpd (66) | subsd Vsd,Wsd (F2)
+5d: minps Vps,Wps | minss Vss,Wss (F3) | minpd Vpd,Wpd (66) | minsd Vsd,Wsd (F2)
+5e: divps Vps,Wps | divss Vss,Wss (F3) | divpd Vpd,Wpd (66) | divsd Vsd,Wsd (F2)
+5f: maxps Vps,Wps | maxss Vss,Wss (F3) | maxpd Vpd,Wpd (66) | maxsd Vsd,Wsd (F2)
# 0x0f 0x60-0x6f
-60:
-61:
-62:
-63:
-64:
-65:
-66:
-67:
-68:
-69:
-6a:
-6b:
-6c:
-6d:
-6e:
-6f:
+60: punpcklbw Pq,Qd | punpcklbw Vdq,Wdq (66)
+61: punpcklwd Pq,Qd | punpcklwd Vdq,Wdq (66)
+62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66)
+63: packsswb Pq,Qq | packsswb Vdq,Wdq (66)
+64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66)
+65: pcmpgtw Pq,Qq | pcmpgtw(66) Vdq,Wdq
+66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66)
+67: packuswb Pq,Qq | packuswb(66) Vdq,Wdq
+68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66)
+69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66)
+6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66)
+6b: packssdw Pq,Qd | packssdw Vdq,Wdq (66)
+6c: punpcklqdq Vdq,Wdq (66)
+6d: punpckhqdq Vdq,Wdq (66)
+6e: movd/q/ Pd,Ed/q | movd/q Vdq,Ed/q (66)
+6f: movq Pq,Qq | movdqa Vdq,Wdq (66) | movdqu Vdq,Wdq (F3)
# 0x0f 0x70-0x7f
-70:
+70: pshufw Pq,Qq,Ib | pshufd Vdq,Wdq,Ib (66) | pshufhw Vdq,Wdq,Ib (F3) | pshuflw VdqWdq,Ib (F2)
71: Grp12 (1A)
72: Grp13 (1A)
73: Grp14 (1A)
-74:
-75:
-76:
-77:
+74: pcmpeqb Pq,Qq | pcmpeqb Vdq,Wdq (66)
+75: pcmpeqw Pq,Qq | pcmpeqw Vdq,Wdq (66)
+76: pcmpeqd Pq,Qq | pcmpeqd Vdq,Wdq (66)
+77: emms
78: VMREAD Ed/q,Gd/q
79: VMWRITE Gd/q,Ed/q
7a:
7b:
-7c:
-7d:
-7e:
-7f:
+7c: haddps(F2) Vps,Wps | haddpd(66) Vpd,Wpd
+7d: hsubps(F2) Vps,Wps | hsubpd(66) Vpd,Wpd
+7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66) | movq Vq,Wq (F3)
+7f: movq Qq,Pq | movdqa Wdq,Vdq (66) | movdqu Wdq,Vdq (F3)
# 0x0f 0x80-0x8f
80: JO Jz (f64)
81: JNO Jz (f64)
@@ -499,11 +499,11 @@ bf: MOVSX Gv,Ew
# 0x0f 0xc0-0xcf
c0: XADD Eb,Gb
c1: XADD Ev,Gv
-c2:
+c2: cmpps Vps,Wps,Ib | cmpss Vss,Wss,Ib (F3) | cmppd Vpd,Wpd,Ib (66) | cmpsd Vsd,Wsd,Ib (F2)
c3: movnti Md/q,Gd/q
-c4:
-c5:
-c6:
+c4: pinsrw Pq,Rd/q/Mw,Ib | pinsrw Vdq,Rd/q/Mw,Ib (66)
+c5: pextrw Gd,Nq,Ib | pextrw Gd,Udq,Ib (66)
+c6: shufps Vps,Wps,Ib | shufpd Vpd,Wpd,Ib (66)
c7: Grp9 (1A)
c8: BSWAP RAX/EAX/R8/R8D
c9: BSWAP RCX/ECX/R9/R9D
@@ -514,60 +514,131 @@ cd: BSWAP RBP/EBP/R13/R13D
ce: BSWAP RSI/ESI/R14/R14D
cf: BSWAP RDI/EDI/R15/R15D
# 0x0f 0xd0-0xdf
-d0:
-d1:
-d2:
-d3:
-d4:
-d5:
-d6:
-d7:
-d8:
-d9:
-da:
-db:
-dc:
-dd:
-de:
-df:
+d0: addsubps Vps,Wps (F2) | addsubpd Vpd,Wpd (66)
+d1: psrlw Pq,Qq | psrlw Vdq,Wdq (66)
+d2: psrld Pq,Qq | psrld Vdq,Wdq (66)
+d3: psrlq Pq,Qq | psrlq Vdq,Wdq (66)
+d4: paddq Pq,Qq | paddq Vdq,Wdq (66)
+d5: pmullw Pq,Qq | pmullw Vdq,Wdq (66)
+d6: movq Wq,Vq (66) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
+d7: pmovmskb Gd,Nq | pmovmskb Gd,Udq (66)
+d8: psubusb Pq,Qq | psubusb Vdq,Wdq (66)
+d9: psubusw Pq,Qq | psubusw Vdq,Wdq (66)
+da: pminub Pq,Qq | pminub Vdq,Wdq (66)
+db: pand Pq,Qq | pand Vdq,Wdq (66)
+dc: paddusb Pq,Qq | paddusb Vdq,Wdq (66)
+dd: paddusw Pq,Qq | paddusw Vdq,Wdq (66)
+de: pmaxub Pq,Qq | pmaxub Vdq,Wdq (66)
+df: pandn Pq,Qq | pandn Vdq,Wdq (66)
# 0x0f 0xe0-0xef
-e0:
-e1:
-e2:
-e3:
-e4:
-e5:
-e6:
-e7:
-e8:
-e9:
-ea:
-eb:
-ec:
-ed:
-ee:
-ef:
+e0: pavgb Pq,Qq | pavgb Vdq,Wdq (66)
+e1: psraw Pq,Qq | psraw Vdq,Wdq (66)
+e2: psrad Pq,Qq | psrad Vdq,Wdq (66)
+e3: pavgw Pq,Qq | pavgw Vdq,Wdq (66)
+e4: pmulhuw Pq,Qq | pmulhuw Vdq,Wdq (66)
+e5: pmulhw Pq,Qq | pmulhw Vdq,Wdq (66)
+e6: cvtpd2dq Vdq,Wpd (F2) | cvttpd2dq Vdq,Wpd (66) | cvtdq2pd Vpd,Wdq (F3)
+e7: movntq Mq,Pq | movntdq Mdq,Vdq (66)
+e8: psubsb Pq,Qq | psubsb Vdq,Wdq (66)
+e9: psubsw Pq,Qq | psubsw Vdq,Wdq (66)
+ea: pminsw Pq,Qq | pminsw Vdq,Wdq (66)
+eb: por Pq,Qq | por Vdq,Wdq (66)
+ec: paddsb Pq,Qq | paddsb Vdq,Wdq (66)
+ed: paddsw Pq,Qq | paddsw Vdq,Wdq (66)
+ee: pmaxsw Pq,Qq | pmaxsw Vdq,Wdq (66)
+ef: pxor Pq,Qq | pxor Vdq,Wdq (66)
# 0x0f 0xf0-0xff
-f0:
-f1:
-f2:
-f3:
-f4:
-f5:
-f6:
-f7:
-f8:
-f9:
-fa:
-fb:
-fc:
-fd:
-fe:
+f0: lddqu Vdq,Mdq (F2)
+f1: psllw Pq,Qq | psllw Vdq,Wdq (66)
+f2: pslld Pq,Qq | pslld Vdq,Wdq (66)
+f3: psllq Pq,Qq | psllq Vdq,Wdq (66)
+f4: pmuludq Pq,Qq | pmuludq Vdq,Wdq (66)
+f5: pmaddwd Pq,Qq | pmaddwd Vdq,Wdq (66)
+f6: psadbw Pq,Qq | psadbw Vdq,Wdq (66)
+f7: maskmovq Pq,Nq | maskmovdqu Vdq,Udq (66)
+f8: psubb Pq,Qq | psubb Vdq,Wdq (66)
+f9: psubw Pq,Qq | psubw Vdq,Wdq (66)
+fa: psubd Pq,Qq | psubd Vdq,Wdq (66)
+fb: psubq Pq,Qq | psubq Vdq,Wdq (66)
+fc: paddb Pq,Qq | paddb Vdq,Wdq (66)
+fd: paddw Pq,Qq | paddw Vdq,Wdq (66)
+fe: paddd Pq,Qq | paddd Vdq,Wdq (66)
ff:
EndTable

Table: 3-byte opcode 1
Referrer: 3-byte escape 1
+# 0x0f 0x38 0x00-0x0f
+00: pshufb Pq,Qq | pshufb Vdq,Wdq (66)
+01: phaddw Pq,Qq | phaddw Vdq,Wdq (66)
+02: phaddd Pq,Qq | phaddd Vdq,Wdq (66)
+03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66)
+04: pmaddubsw Pq,Qq | pmaddubsw (66)Vdq,Wdq
+05: phsubw Pq,Qq | phsubw Vdq,Wdq (66)
+06: phsubd Pq,Qq | phsubd Vdq,Wdq (66)
+07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66)
+08: psignb Pq,Qq | psignb Vdq,Wdq (66)
+09: psignw Pq,Qq | psignw Vdq,Wdq (66)
+0a: psignd Pq,Qq | psignd Vdq,Wdq (66)
+0b: pmulhrsw Pq,Qq | pmulhrsw Vdq,Wdq (66)
+0c:
+0d:
+0e:
+0f:
+# 0x0f 0x38 0x10-0x1f
+10: pblendvb Vdq,Wdq (66)
+11:
+12:
+13:
+14: blendvps Vdq,Wdq (66)
+15: blendvpd Vdq,Wdq (66)
+16:
+17: ptest Vdq,Wdq (66)
+18:
+19:
+1a:
+1b:
+1c: pabsb Pq,Qq | pabsb Vdq,Wdq (66)
+1d: pabsw Pq,Qq | pabsw Vdq,Wdq (66)
+1e: pabsd Pq,Qq | pabsd Vdq,Wdq (66)
+1f:
+# 0x0f 0x38 0x20-0x2f
+20: pmovsxbw Vdq,Udq/Mq (66)
+21: pmovsxbd Vdq,Udq/Md (66)
+22: pmovsxbq Vdq,Udq/Mw (66)
+23: pmovsxwd Vdq,Udq/Mq (66)
+24: pmovsxwq Vdq,Udq/Md (66)
+25: pmovsxdq Vdq,Udq/Mq (66)
+26:
+27:
+28: pmuldq Vdq,Wdq (66)
+29: pcmpeqq Vdq,Wdq (66)
+2a: movntdqa Vdq,Mdq (66)
+2b: packusdw Vdq,Wdq (66)
+2c:
+2d:
+2e:
+2f:
+# 0x0f 0x38 0x30-0x3f
+30: pmovzxbw Vdq,Udq/Mq (66)
+31: pmovzxbd Vdq,Udq/Md (66)
+32: pmovzxbq Vdq,Udq/Mw (66)
+33: pmovzxwd Vdq,Udq/Mq (66)
+34: pmovzxwq Vdq,Udq/Md (66)
+35: pmovzxdq Vdq,Udq/Mq (66)
+36:
+37: pcmpgtq Vdq,Wdq (66)
+38: pminsb Vdq,Wdq (66)
+39: pminsd Vdq,Wdq (66)
+3a: pminuw Vdq,Wdq (66)
+3b: pminud Vdq,Wdq (66)
+3c: pmaxsb Vdq,Wdq (66)
+3d: pmaxsd Vdq,Wdq (66)
+3e: pmaxuw Vdq,Wdq (66)
+3f: pmaxud Vdq,Wdq (66)
+# 0x0f 0x38 0x4f-0xff
+40: pmulld Vdq,Wdq (66)
+41: phminposuw Vdq,Wdq (66)
80: INVEPT Gd/q,Mdq (66)
81: INVPID Gd/q,Mdq (66)
f0: MOVBE Gv,Mv | CRC32 Gd,Eb (F2)
@@ -576,7 +647,29 @@ EndTable

Table: 3-byte opcode 2
Referrer: 3-byte escape 2
-# all opcode is for SSE
+# 0x0f 0x3a 0x00-0xff
+08: roundps Vdq,Wdq,Ib (66)
+09: roundpd Vdq,Wdq,Ib (66)
+0a: roundss Vss,Wss,Ib (66)
+0b: roundsd Vsd,Wsd,Ib (66)
+0c: blendps Vdq,Wdq,Ib (66)
+0d: blendpd Vdq,Wdq,Ib (66)
+0e: pblendw Vdq,Wdq,Ib (66)
+0f: palignr Pq,Qq,Ib | palignr Vdq,Wdq,Ib (66)
+14: pextrb Rd/Mb,Vdq,Ib (66)
+15: pextrw Rd/Mw,Vdq,Ib (66)
+16: pextrd/pextrq Ed/q,Vdq,Ib (66)
+17: extractps Ed,Vdq,Ib (66)
+20: pinsrb Vdq,Rd/q/Mb,Ib (66)
+21: insertps Vdq,Udq/Md,Ib (66)
+22: pinsrd/pinsrq Vdq,Ed/q,Ib (66)
+40: dpps Vdq,Wdq,Ib (66)
+41: dppd Vdq,Wdq,Ib (66)
+42: mpsadbw Vdq,Wdq,Ib (66)
+60: pcmpestrm Vdq,Wdq,Ib (66)
+61: pcmpestri Vdq,Wdq,Ib (66)
+62: pcmpistrm Vdq,Wdq,Ib (66)
+63: pcmpistri Vdq,Wdq,Ib (66)
EndTable

GrpTable: Grp1
--
1.6.2.3

2009-09-22 19:41:51

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 10/24] kprobes/x86-32: Move irq-exit functions to kprobes section

From: Masami Hiramatsu <[email protected]>

Move irq-exit functions to .kprobes.text section to protect against
kprobes recursion.

When I ran kprobe stress test on x86-32, I found below symbols
cause unrecoverable recursive probing:

ret_from_exception
ret_from_intr
check_userspace
restore_all
restore_all_notrace
restore_nocheck
irq_return

And also, I found some interrupt/exception entry points that
cause similar problems.

This patch moves those symbols (including their container functions)
to .kprobes.text section to prevent any kprobes probing.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/entry_32.S | 24 ++++++++++++++++++++++++
kernel/kprobes.c | 2 ++
2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index c097e7d..beb30da 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -334,6 +334,10 @@ ENTRY(ret_from_fork)
END(ret_from_fork)

/*
+ * Interrupt exit functions should be protected against kprobes
+ */
+ .pushsection .kprobes.text, "ax"
+/*
* Return to user mode is not as complex as all this looks,
* but we want the default path for a system call return to
* go as quickly as possible which is why some of this is
@@ -383,6 +387,10 @@ need_resched:
END(resume_kernel)
#endif
CFI_ENDPROC
+/*
+ * End of kprobes section
+ */
+ .popsection

/* SYSENTER_RETURN points to after the "sysenter" instruction in
the vsyscall page. See vsyscall-sysentry.S, which defines the symbol. */
@@ -513,6 +521,10 @@ sysexit_audit:
PTGS_TO_GS_EX
ENDPROC(ia32_sysenter_target)

+/*
+ * syscall stub including irq exit should be protected against kprobes
+ */
+ .pushsection .kprobes.text, "ax"
# system call handler stub
ENTRY(system_call)
RING0_INT_FRAME # can't unwind into user space anyway
@@ -705,6 +717,10 @@ syscall_badsys:
jmp resume_userspace
END(syscall_badsys)
CFI_ENDPROC
+/*
+ * End of kprobes section
+ */
+ .popsection

/*
* System calls that need a pt_regs pointer.
@@ -814,6 +830,10 @@ common_interrupt:
ENDPROC(common_interrupt)
CFI_ENDPROC

+/*
+ * Irq entries should be protected against kprobes
+ */
+ .pushsection .kprobes.text, "ax"
#define BUILD_INTERRUPT3(name, nr, fn) \
ENTRY(name) \
RING0_INT_FRAME; \
@@ -980,6 +1000,10 @@ ENTRY(spurious_interrupt_bug)
jmp error_code
CFI_ENDPROC
END(spurious_interrupt_bug)
+/*
+ * End of kprobes section
+ */
+ .popsection

ENTRY(kernel_thread_helper)
pushl $0 # fake return address for unwinder
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 3267d90..00d01b0 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -91,6 +91,8 @@ static spinlock_t *kretprobe_table_lock_ptr(unsigned long hash)
static struct kprobe_blackpoint kprobe_blacklist[] = {
{"preempt_schedule",},
{"native_get_debugreg",},
+ {"irq_entries_start",},
+ {"common_interrupt",},
{NULL} /* Terminator */
};

--
1.6.2.3

2009-09-22 19:38:56

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 11/24] x86/ptrace: Fix regs_get_argument_nth() to add correct offset

From: Masami Hiramatsu <[email protected]>

Fix regs_get_argument_nth() to add correct offset bytes. Because
offset_of() returns offset in byte, the offset should be added
to char * instead of unsigned long *.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
arch/x86/kernel/ptrace.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index a33a17d..caffb68 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -150,7 +150,7 @@ static const int arg_offs_table[] = {
unsigned long regs_get_argument_nth(struct pt_regs *regs, unsigned int n)
{
if (n < ARRAY_SIZE(arg_offs_table))
- return *((unsigned long *)regs + arg_offs_table[n]);
+ return *(unsigned long *)((char *)regs + arg_offs_table[n]);
else {
/*
* The typical case: arg n is on the stack.
--
1.6.2.3

2009-09-22 19:39:01

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 12/24] tracing/kprobes: Fix probe offset to be unsigned

From: Masami Hiramatsu <[email protected]>

Prohibit user to specify negative offset from symbols.
Since kprobe.offset is unsigned int, the offset must be always positive
value.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Documentation/trace/kprobetrace.txt | 14 +++++++-------
kernel/trace/trace_kprobe.c | 19 +++++++------------
2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 3de7517..db55318 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -25,15 +25,15 @@ probe events via /sys/kernel/debug/tracing/events/kprobes/<EVENT>/filter.

Synopsis of kprobe_events
-------------------------
- p[:EVENT] SYMBOL[+offs|-offs]|MEMADDR [FETCHARGS] : Set a probe
- r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+ p[:EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe
+ r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe

- EVENT : Event name. If omitted, the event name is generated
- based on SYMBOL+offs or MEMADDR.
- SYMBOL[+offs|-offs] : Symbol+offset where the probe is inserted.
- MEMADDR : Address where the probe is inserted.
+ EVENT : Event name. If omitted, the event name is generated
+ based on SYMBOL+offs or MEMADDR.
+ SYMBOL[+offs] : Symbol+offset where the probe is inserted.
+ MEMADDR : Address where the probe is inserted.

- FETCHARGS : Arguments. Each probe can have up to 128 args.
+ FETCHARGS : Arguments. Each probe can have up to 128 args.
%REG : Fetch register REG
sN : Fetch Nth entry of stack (N >= 0)
sa : Fetch stack address.
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 19a6de6..c24b7e9 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -210,7 +210,7 @@ static __kprobes const char *probe_symbol(struct trace_probe *tp)
return tp->symbol ? tp->symbol : "unknown";
}

-static __kprobes long probe_offset(struct trace_probe *tp)
+static __kprobes unsigned int probe_offset(struct trace_probe *tp)
{
return (probe_is_return(tp)) ? tp->rp.kp.offset : tp->kp.offset;
}
@@ -380,7 +380,7 @@ end:
}

/* Split symbol and offset. */
-static int split_symbol_offset(char *symbol, long *offset)
+static int split_symbol_offset(char *symbol, unsigned long *offset)
{
char *tmp;
int ret;
@@ -389,16 +389,11 @@ static int split_symbol_offset(char *symbol, long *offset)
return -EINVAL;

tmp = strchr(symbol, '+');
- if (!tmp)
- tmp = strchr(symbol, '-');
-
if (tmp) {
/* skip sign because strict_strtol doesn't accept '+' */
- ret = strict_strtol(tmp + 1, 0, offset);
+ ret = strict_strtoul(tmp + 1, 0, offset);
if (ret)
return ret;
- if (*tmp == '-')
- *offset = -(*offset);
*tmp = '\0';
} else
*offset = 0;
@@ -520,7 +515,7 @@ static int create_trace_probe(int argc, char **argv)
{
/*
* Argument syntax:
- * - Add kprobe: p[:EVENT] SYMBOL[+OFFS|-OFFS]|ADDRESS [FETCHARGS]
+ * - Add kprobe: p[:EVENT] SYMBOL[+OFFS]|ADDRESS [FETCHARGS]
* - Add kretprobe: r[:EVENT] SYMBOL[+0] [FETCHARGS]
* Fetch args:
* aN : fetch Nth of function argument. (N:0-)
@@ -539,7 +534,7 @@ static int create_trace_probe(int argc, char **argv)
int i, ret = 0;
int is_return = 0;
char *symbol = NULL, *event = NULL;
- long offset = 0;
+ unsigned long offset = 0;
void *addr = NULL;

if (argc < 2)
@@ -605,7 +600,7 @@ static int create_trace_probe(int argc, char **argv)

if (tp->symbol) {
kp->symbol_name = tp->symbol;
- kp->offset = offset;
+ kp->offset = (unsigned int)offset;
} else
kp->addr = addr;

@@ -675,7 +670,7 @@ static int probes_seq_show(struct seq_file *m, void *v)
seq_printf(m, ":%s", tp->call.name);

if (tp->symbol)
- seq_printf(m, " %s%+ld", probe_symbol(tp), probe_offset(tp));
+ seq_printf(m, " %s+%u", probe_symbol(tp), probe_offset(tp));
else
seq_printf(m, " 0x%p", probe_address(tp));

--
1.6.2.3

2009-09-22 19:39:04

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 13/24] tracing/kprobes: Cleanup kprobe tracer code.

From: Masami Hiramatsu <[email protected]>

Simplify trace_probe to remove a union, and remove some redundant
wrappers.
And also, cleanup create_trace_probe() function.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace_kprobe.c | 81 ++++++++++++++++++-------------------------
1 files changed, 34 insertions(+), 47 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c24b7e9..4ce728c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -180,10 +180,7 @@ static __kprobes void free_indirect_fetch_data(struct indirect_fetch_data *data)

struct trace_probe {
struct list_head list;
- union {
- struct kprobe kp;
- struct kretprobe rp;
- };
+ struct kretprobe rp; /* Use rp.kp for kprobe use */
unsigned long nhit;
const char *symbol; /* symbol name */
struct ftrace_event_call call;
@@ -202,7 +199,7 @@ static int kretprobe_trace_func(struct kretprobe_instance *ri,

static __kprobes int probe_is_return(struct trace_probe *tp)
{
- return (tp->rp.handler == kretprobe_trace_func);
+ return tp->rp.handler != NULL;
}

static __kprobes const char *probe_symbol(struct trace_probe *tp)
@@ -210,16 +207,6 @@ static __kprobes const char *probe_symbol(struct trace_probe *tp)
return tp->symbol ? tp->symbol : "unknown";
}

-static __kprobes unsigned int probe_offset(struct trace_probe *tp)
-{
- return (probe_is_return(tp)) ? tp->rp.kp.offset : tp->kp.offset;
-}
-
-static __kprobes void *probe_address(struct trace_probe *tp)
-{
- return (probe_is_return(tp)) ? tp->rp.kp.addr : tp->kp.addr;
-}
-
static int probe_arg_string(char *buf, size_t n, struct fetch_func *ff)
{
int ret = -EINVAL;
@@ -269,8 +256,14 @@ static void unregister_probe_event(struct trace_probe *tp);
static DEFINE_MUTEX(probe_lock);
static LIST_HEAD(probe_list);

-static struct trace_probe *alloc_trace_probe(const char *symbol,
- const char *event, int nargs)
+/*
+ * Allocate new trace_probe and initialize it (including kprobes).
+ */
+static struct trace_probe *alloc_trace_probe(const char *event,
+ void *addr,
+ const char *symbol,
+ unsigned long offs,
+ int nargs, int is_return)
{
struct trace_probe *tp;

@@ -282,7 +275,16 @@ static struct trace_probe *alloc_trace_probe(const char *symbol,
tp->symbol = kstrdup(symbol, GFP_KERNEL);
if (!tp->symbol)
goto error;
- }
+ tp->rp.kp.symbol_name = tp->symbol;
+ tp->rp.kp.offset = offs;
+ } else
+ tp->rp.kp.addr = addr;
+
+ if (is_return)
+ tp->rp.handler = kretprobe_trace_func;
+ else
+ tp->rp.kp.pre_handler = kprobe_trace_func;
+
if (!event)
goto error;
tp->call.name = kstrdup(event, GFP_KERNEL);
@@ -327,7 +329,7 @@ static void __unregister_trace_probe(struct trace_probe *tp)
if (probe_is_return(tp))
unregister_kretprobe(&tp->rp);
else
- unregister_kprobe(&tp->kp);
+ unregister_kprobe(&tp->rp.kp);
}

/* Unregister a trace_probe and probe_event: call with locking probe_lock */
@@ -349,14 +351,14 @@ static int register_trace_probe(struct trace_probe *tp)
if (probe_is_return(tp))
ret = register_kretprobe(&tp->rp);
else
- ret = register_kprobe(&tp->kp);
+ ret = register_kprobe(&tp->rp.kp);

if (ret) {
pr_warning("Could not insert probe(%d)\n", ret);
if (ret == -EILSEQ) {
pr_warning("Probing address(0x%p) is not an "
"instruction boundary.\n",
- probe_address(tp));
+ tp->rp.kp.addr);
ret = -EINVAL;
}
goto end;
@@ -530,12 +532,12 @@ static int create_trace_probe(int argc, char **argv)
* +|-offs(ARG) : fetch memory at ARG +|- offs address.
*/
struct trace_probe *tp;
- struct kprobe *kp;
int i, ret = 0;
int is_return = 0;
char *symbol = NULL, *event = NULL;
unsigned long offset = 0;
void *addr = NULL;
+ char buf[MAX_EVENT_NAME_LEN];

if (argc < 2)
return -EINVAL;
@@ -577,33 +579,18 @@ static int create_trace_probe(int argc, char **argv)
/* setup a probe */
if (!event) {
/* Make a new event name */
- char buf[MAX_EVENT_NAME_LEN];
if (symbol)
snprintf(buf, MAX_EVENT_NAME_LEN, "%c@%s%+ld",
is_return ? 'r' : 'p', symbol, offset);
else
snprintf(buf, MAX_EVENT_NAME_LEN, "%c@0x%p",
is_return ? 'r' : 'p', addr);
- tp = alloc_trace_probe(symbol, buf, argc);
- } else
- tp = alloc_trace_probe(symbol, event, argc);
+ event = buf;
+ }
+ tp = alloc_trace_probe(event, addr, symbol, offset, argc, is_return);
if (IS_ERR(tp))
return PTR_ERR(tp);

- if (is_return) {
- kp = &tp->rp.kp;
- tp->rp.handler = kretprobe_trace_func;
- } else {
- kp = &tp->kp;
- tp->kp.pre_handler = kprobe_trace_func;
- }
-
- if (tp->symbol) {
- kp->symbol_name = tp->symbol;
- kp->offset = (unsigned int)offset;
- } else
- kp->addr = addr;
-
/* parse arguments */
ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
@@ -670,9 +657,9 @@ static int probes_seq_show(struct seq_file *m, void *v)
seq_printf(m, ":%s", tp->call.name);

if (tp->symbol)
- seq_printf(m, " %s+%u", probe_symbol(tp), probe_offset(tp));
+ seq_printf(m, " %s+%u", probe_symbol(tp), tp->rp.kp.offset);
else
- seq_printf(m, " 0x%p", probe_address(tp));
+ seq_printf(m, " 0x%p", tp->rp.kp.addr);

for (i = 0; i < tp->nr_args; i++) {
ret = probe_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i]);
@@ -783,7 +770,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
struct trace_probe *tp = v;

seq_printf(m, " %-44s %15lu %15lu\n", tp->call.name, tp->nhit,
- probe_is_return(tp) ? tp->rp.kp.nmissed : tp->kp.nmissed);
+ tp->rp.kp.nmissed);

return 0;
}
@@ -811,7 +798,7 @@ static const struct file_operations kprobe_profile_ops = {
/* Kprobe handler */
static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs)
{
- struct trace_probe *tp = container_of(kp, struct trace_probe, kp);
+ struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
struct kprobe_trace_entry *entry;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
@@ -866,7 +853,7 @@ static __kprobes int kretprobe_trace_func(struct kretprobe_instance *ri,

entry = ring_buffer_event_data(event);
entry->nargs = tp->nr_args;
- entry->func = (unsigned long)probe_address(tp);
+ entry->func = (unsigned long)tp->rp.kp.addr;
entry->ret_ip = (unsigned long)ri->ret_addr;
for (i = 0; i < tp->nr_args; i++)
entry->args[i] = call_fetch(&tp->args[i], regs);
@@ -945,7 +932,7 @@ static int probe_event_enable(struct ftrace_event_call *call)
if (probe_is_return(tp))
return enable_kretprobe(&tp->rp);
else
- return enable_kprobe(&tp->kp);
+ return enable_kprobe(&tp->rp.kp);
}

static void probe_event_disable(struct ftrace_event_call *call)
@@ -955,7 +942,7 @@ static void probe_event_disable(struct ftrace_event_call *call)
if (probe_is_return(tp))
disable_kretprobe(&tp->rp);
else
- disable_kprobe(&tp->kp);
+ disable_kprobe(&tp->rp.kp);
}

static int probe_event_raw_init(struct ftrace_event_call *event_call)
--
1.6.2.3

2009-09-22 19:41:31

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 14/24] tracing/kprobes: Add event profiling support

From: Masami Hiramatsu <[email protected]>

Add *probe_profile_enable/disable to support kprobes raw events
sampling from perf counters, like other ftrace events, when
CONFIG_PROFILE_EVENT=y.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Documentation/trace/kprobetrace.txt | 4 +-
kernel/trace/trace_kprobe.c | 110 ++++++++++++++++++++++++++++++++++-
2 files changed, 111 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index db55318..8f882eb 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -62,13 +62,15 @@ enabled:
You can enable/disable the probe by writing 1 or 0 on it.

format:
- It shows the format of this probe event. It also shows aliases of arguments
+ This shows the format of this probe event. It also shows aliases of arguments
which you specified to kprobe_events.

filter:
You can write filtering rules of this event. And you can use both of aliase
names and field names for describing filters.

+id:
+ This shows the id of this probe event.

Event Profiling
---------------
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4ce728c..730e992 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -28,6 +28,7 @@
#include <linux/string.h>
#include <linux/ctype.h>
#include <linux/ptrace.h>
+#include <linux/perf_counter.h>

#include "trace.h"
#include "trace_output.h"
@@ -280,6 +281,7 @@ static struct trace_probe *alloc_trace_probe(const char *event,
} else
tp->rp.kp.addr = addr;

+ /* Set handler here for checking whether this probe is return or not. */
if (is_return)
tp->rp.handler = kretprobe_trace_func;
else
@@ -929,10 +931,13 @@ static int probe_event_enable(struct ftrace_event_call *call)
{
struct trace_probe *tp = (struct trace_probe *)call->data;

- if (probe_is_return(tp))
+ if (probe_is_return(tp)) {
+ tp->rp.handler = kretprobe_trace_func;
return enable_kretprobe(&tp->rp);
- else
+ } else {
+ tp->rp.kp.pre_handler = kprobe_trace_func;
return enable_kprobe(&tp->rp.kp);
+ }
}

static void probe_event_disable(struct ftrace_event_call *call)
@@ -1105,6 +1110,101 @@ static int kretprobe_event_show_format(struct ftrace_event_call *call,
"func, ret_ip");
}

+#ifdef CONFIG_EVENT_PROFILE
+
+/* Kprobe profile handler */
+static __kprobes int kprobe_profile_func(struct kprobe *kp,
+ struct pt_regs *regs)
+{
+ struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
+ struct ftrace_event_call *call = &tp->call;
+ struct kprobe_trace_entry *entry;
+ int size, i, pc;
+ unsigned long irq_flags;
+
+ local_save_flags(irq_flags);
+ pc = preempt_count();
+
+ size = SIZEOF_KPROBE_TRACE_ENTRY(tp->nr_args);
+
+ do {
+ char raw_data[size];
+ struct trace_entry *ent;
+
+ *(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+ entry = (struct kprobe_trace_entry *)raw_data;
+ ent = &entry->ent;
+
+ tracing_generic_entry_update(ent, irq_flags, pc);
+ ent->type = call->id;
+ entry->nargs = tp->nr_args;
+ entry->ip = (unsigned long)kp->addr;
+ for (i = 0; i < tp->nr_args; i++)
+ entry->args[i] = call_fetch(&tp->args[i], regs);
+ perf_tpcounter_event(call->id, entry->ip, 1, entry, size);
+ } while (0);
+ return 0;
+}
+
+/* Kretprobe profile handler */
+static __kprobes int kretprobe_profile_func(struct kretprobe_instance *ri,
+ struct pt_regs *regs)
+{
+ struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
+ struct ftrace_event_call *call = &tp->call;
+ struct kretprobe_trace_entry *entry;
+ int size, i, pc;
+ unsigned long irq_flags;
+
+ local_save_flags(irq_flags);
+ pc = preempt_count();
+
+ size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp->nr_args);
+
+ do {
+ char raw_data[size];
+ struct trace_entry *ent;
+
+ *(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
+ entry = (struct kretprobe_trace_entry *)raw_data;
+ ent = &entry->ent;
+
+ tracing_generic_entry_update(ent, irq_flags, pc);
+ ent->type = call->id;
+ entry->nargs = tp->nr_args;
+ entry->func = (unsigned long)tp->rp.kp.addr;
+ entry->ret_ip = (unsigned long)ri->ret_addr;
+ for (i = 0; i < tp->nr_args; i++)
+ entry->args[i] = call_fetch(&tp->args[i], regs);
+ perf_tpcounter_event(call->id, entry->ret_ip, 1, entry, size);
+ } while (0);
+ return 0;
+}
+
+static int probe_profile_enable(struct ftrace_event_call *call)
+{
+ struct trace_probe *tp = (struct trace_probe *)call->data;
+
+ if (atomic_inc_return(&call->profile_count))
+ return 0;
+
+ if (probe_is_return(tp)) {
+ tp->rp.handler = kretprobe_profile_func;
+ return enable_kretprobe(&tp->rp);
+ } else {
+ tp->rp.kp.pre_handler = kprobe_profile_func;
+ return enable_kprobe(&tp->rp.kp);
+ }
+}
+
+static void probe_profile_disable(struct ftrace_event_call *call)
+{
+ if (atomic_add_negative(-1, &call->profile_count))
+ probe_event_disable(call);
+}
+
+#endif /* CONFIG_EVENT_PROFILE */
+
static int register_probe_event(struct trace_probe *tp)
{
struct ftrace_event_call *call = &tp->call;
@@ -1130,6 +1230,12 @@ static int register_probe_event(struct trace_probe *tp)
call->enabled = 1;
call->regfunc = probe_event_enable;
call->unregfunc = probe_event_disable;
+
+#ifdef CONFIG_EVENT_PROFILE
+ atomic_set(&call->profile_count, -1);
+ call->profile_enable = probe_profile_enable;
+ call->profile_disable = probe_profile_disable;
+#endif
call->data = tp;
ret = trace_add_event_call(call);
if (ret) {
--
1.6.2.3

2009-09-22 19:39:11

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 15/24] tracing/kprobes: Add argument name support

From: Masami Hiramatsu <[email protected]>

Add argument name assignment support and remove "alias" lines from format.
This allows user to assign unique name to each argument. For example,

$ echo p do_sys_open dfd=a0 filename=a1 flags=a2 mode=a3 > kprobe_events

This assigns dfd, filename, flags, and mode to 1st - 4th arguments
respectively. Trace buffer shows those names too.

<...>-1439 [000] 1200885.933147: do_sys_open+0x0/0xdf: dfd=ffffff9c filename=bfa898ac flags=8000 mode=0

This helps users to know what each value means.

Users can filter each events by these names too. Note that you can not
filter by argN anymore.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Documentation/trace/kprobetrace.txt | 46 ++++++-------
kernel/trace/trace_kprobe.c | 128 +++++++++++++++++-----------------
2 files changed, 84 insertions(+), 90 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 8f882eb..aaa6c10 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -42,7 +42,8 @@ Synopsis of kprobe_events
aN : Fetch function argument. (N >= 0)(*)
rv : Fetch return value.(**)
ra : Fetch return address.(**)
- +|-offs(FETCHARG) : fetch memory at FETCHARG +|- offs address.(***)
+ +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(***)
+ NAME=FETCHARG: Set NAME as the argument name of FETCHARG.

(*) aN may not correct on asmlinkaged functions and at the middle of
function body.
@@ -62,12 +63,10 @@ enabled:
You can enable/disable the probe by writing 1 or 0 on it.

format:
- This shows the format of this probe event. It also shows aliases of arguments
- which you specified to kprobe_events.
+ This shows the format of this probe event.

filter:
- You can write filtering rules of this event. And you can use both of aliase
- names and field names for describing filters.
+ You can write filtering rules of this event.

id:
This shows the id of this probe event.
@@ -85,10 +84,11 @@ Usage examples
To add a probe as a new event, write a new definition to kprobe_events
as below.

- echo p:myprobe do_sys_open a0 a1 a2 a3 > /sys/kernel/debug/tracing/kprobe_events
+ echo p:myprobe do_sys_open dfd=a0 filename=a1 flags=a2 mode=a3 > /sys/kernel/debug/tracing/kprobe_events

This sets a kprobe on the top of do_sys_open() function with recording
-1st to 4th arguments as "myprobe" event.
+1st to 4th arguments as "myprobe" event. As this example shows, users can
+choose more familiar names for each arguments.

echo r:myretprobe do_sys_open rv ra >> /sys/kernel/debug/tracing/kprobe_events

@@ -99,7 +99,7 @@ recording return value and return address as "myretprobe" event.

cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
name: myprobe
-ID: 23
+ID: 75
format:
field:unsigned short common_type; offset:0; size:2;
field:unsigned char common_flags; offset:2; size:1;
@@ -109,21 +109,15 @@ format:

field: unsigned long ip; offset:16;tsize:8;
field: int nargs; offset:24;tsize:4;
- field: unsigned long arg0; offset:32;tsize:8;
- field: unsigned long arg1; offset:40;tsize:8;
- field: unsigned long arg2; offset:48;tsize:8;
- field: unsigned long arg3; offset:56;tsize:8;
+ field: unsigned long dfd; offset:32;tsize:8;
+ field: unsigned long filename; offset:40;tsize:8;
+ field: unsigned long flags; offset:48;tsize:8;
+ field: unsigned long mode; offset:56;tsize:8;

- alias: a0; original: arg0;
- alias: a1; original: arg1;
- alias: a2; original: arg2;
- alias: a3; original: arg3;
+print fmt: "%lx: dfd=%lx filename=%lx flags=%lx mode=%lx", ip, REC->dfd, REC->filename, REC->flags, REC->mode

-print fmt: "%lx: 0x%lx 0x%lx 0x%lx 0x%lx", ip, arg0, arg1, arg2, arg3

-
- You can see that the event has 4 arguments and alias expressions
-corresponding to it.
+ You can see that the event has 4 arguments as in the expressions you specified.

echo > /sys/kernel/debug/tracing/kprobe_events

@@ -135,12 +129,12 @@ corresponding to it.
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
- <...>-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: 0x3 0x7fffd1ec4440 0x8000 0x0
- <...>-1447 [001] 1038282.286878: sys_openat+0xc/0xe <- do_sys_open: 0xfffffffffffffffe 0xffffffff81367a3a
- <...>-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: 0xffffff9c 0x40413c 0x8000 0x1b6
- <...>-1447 [001] 1038282.286915: sys_open+0x1b/0x1d <- do_sys_open: 0x3 0xffffffff81367a3a
- <...>-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: 0xffffff9c 0x4041c6 0x98800 0x10
- <...>-1447 [001] 1038282.286976: sys_open+0x1b/0x1d <- do_sys_open: 0x3 0xffffffff81367a3a
+ <...>-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
+ <...>-1447 [001] 1038282.286878: sys_openat+0xc/0xe <- do_sys_open: rv=fffffffffffffffe ra=ffffffff81367a3a
+ <...>-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: dfd=ffffff9c filename=40413c flags=8000 mode=1b6
+ <...>-1447 [001] 1038282.286915: sys_open+0x1b/0x1d <- do_sys_open: rv=3 ra=ffffffff81367a3a
+ <...>-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: dfd=ffffff9c filename=4041c6 flags=98800 mode=10
+ <...>-1447 [001] 1038282.286976: sys_open+0x1b/0x1d <- do_sys_open: rv=3 ra=ffffffff81367a3a


Each line shows when the kernel hits a probe, and <- SYMBOL means kernel
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 730e992..44dad1a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -176,9 +176,14 @@ static __kprobes void free_indirect_fetch_data(struct indirect_fetch_data *data)
}

/**
- * kprobe_trace_core
+ * Kprobe tracer core functions
*/

+struct probe_arg {
+ struct fetch_func fetch;
+ const char *name;
+};
+
struct trace_probe {
struct list_head list;
struct kretprobe rp; /* Use rp.kp for kprobe use */
@@ -187,12 +192,12 @@ struct trace_probe {
struct ftrace_event_call call;
struct trace_event event;
unsigned int nr_args;
- struct fetch_func args[];
+ struct probe_arg args[];
};

#define SIZEOF_TRACE_PROBE(n) \
(offsetof(struct trace_probe, args) + \
- (sizeof(struct fetch_func) * (n)))
+ (sizeof(struct probe_arg) * (n)))

static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
static int kretprobe_trace_func(struct kretprobe_instance *ri,
@@ -301,15 +306,21 @@ error:
return ERR_PTR(-ENOMEM);
}

+static void free_probe_arg(struct probe_arg *arg)
+{
+ if (arg->fetch.func == fetch_symbol)
+ free_symbol_cache(arg->fetch.data);
+ else if (arg->fetch.func == fetch_indirect)
+ free_indirect_fetch_data(arg->fetch.data);
+ kfree(arg->name);
+}
+
static void free_trace_probe(struct trace_probe *tp)
{
int i;

for (i = 0; i < tp->nr_args; i++)
- if (tp->args[i].func == fetch_symbol)
- free_symbol_cache(tp->args[i].data);
- else if (tp->args[i].func == fetch_indirect)
- free_indirect_fetch_data(tp->args[i].data);
+ free_probe_arg(&tp->args[i]);

kfree(tp->call.name);
kfree(tp->symbol);
@@ -532,11 +543,13 @@ static int create_trace_probe(int argc, char **argv)
* %REG : fetch register REG
* Indirect memory fetch:
* +|-offs(ARG) : fetch memory at ARG +|- offs address.
+ * Alias name of args:
+ * NAME=FETCHARG : set NAME as alias of FETCHARG.
*/
struct trace_probe *tp;
int i, ret = 0;
int is_return = 0;
- char *symbol = NULL, *event = NULL;
+ char *symbol = NULL, *event = NULL, *arg = NULL;
unsigned long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
@@ -596,12 +609,21 @@ static int create_trace_probe(int argc, char **argv)
/* parse arguments */
ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
- if (strlen(argv[i]) > MAX_ARGSTR_LEN) {
- pr_info("Argument%d(%s) is too long.\n", i, argv[i]);
+ /* Parse argument name */
+ arg = strchr(argv[i], '=');
+ if (arg)
+ *arg++ = '\0';
+ else
+ arg = argv[i];
+ tp->args[i].name = kstrdup(argv[i], GFP_KERNEL);
+
+ /* Parse fetch argument */
+ if (strlen(arg) > MAX_ARGSTR_LEN) {
+ pr_info("Argument%d(%s) is too long.\n", i, arg);
ret = -ENOSPC;
goto error;
}
- ret = parse_probe_arg(argv[i], &tp->args[i], is_return);
+ ret = parse_probe_arg(arg, &tp->args[i].fetch, is_return);
if (ret)
goto error;
}
@@ -664,12 +686,12 @@ static int probes_seq_show(struct seq_file *m, void *v)
seq_printf(m, " 0x%p", tp->rp.kp.addr);

for (i = 0; i < tp->nr_args; i++) {
- ret = probe_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i]);
+ ret = probe_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i].fetch);
if (ret < 0) {
pr_warning("Argument%d decoding error(%d).\n", i, ret);
return ret;
}
- seq_printf(m, " %s", buf);
+ seq_printf(m, " %s=%s", tp->args[i].name, buf);
}
seq_printf(m, "\n");
return 0;
@@ -824,7 +846,7 @@ static __kprobes int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs)
entry->nargs = tp->nr_args;
entry->ip = (unsigned long)kp->addr;
for (i = 0; i < tp->nr_args; i++)
- entry->args[i] = call_fetch(&tp->args[i], regs);
+ entry->args[i] = call_fetch(&tp->args[i].fetch, regs);

if (!filter_current_check_discard(buffer, call, entry, event))
trace_nowake_buffer_unlock_commit(buffer, event, irq_flags, pc);
@@ -858,7 +880,7 @@ static __kprobes int kretprobe_trace_func(struct kretprobe_instance *ri,
entry->func = (unsigned long)tp->rp.kp.addr;
entry->ret_ip = (unsigned long)ri->ret_addr;
for (i = 0; i < tp->nr_args; i++)
- entry->args[i] = call_fetch(&tp->args[i], regs);
+ entry->args[i] = call_fetch(&tp->args[i].fetch, regs);

if (!filter_current_check_discard(buffer, call, entry, event))
trace_nowake_buffer_unlock_commit(buffer, event, irq_flags, pc);
@@ -872,9 +894,13 @@ print_kprobe_event(struct trace_iterator *iter, int flags)
{
struct kprobe_trace_entry *field;
struct trace_seq *s = &iter->seq;
+ struct trace_event *event;
+ struct trace_probe *tp;
int i;

field = (struct kprobe_trace_entry *)iter->ent;
+ event = ftrace_find_event(field->ent.type);
+ tp = container_of(event, struct trace_probe, event);

if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
goto partial;
@@ -883,7 +909,8 @@ print_kprobe_event(struct trace_iterator *iter, int flags)
goto partial;

for (i = 0; i < field->nargs; i++)
- if (!trace_seq_printf(s, " 0x%lx", field->args[i]))
+ if (!trace_seq_printf(s, " %s=%lx",
+ tp->args[i].name, field->args[i]))
goto partial;

if (!trace_seq_puts(s, "\n"))
@@ -899,9 +926,13 @@ print_kretprobe_event(struct trace_iterator *iter, int flags)
{
struct kretprobe_trace_entry *field;
struct trace_seq *s = &iter->seq;
+ struct trace_event *event;
+ struct trace_probe *tp;
int i;

field = (struct kretprobe_trace_entry *)iter->ent;
+ event = ftrace_find_event(field->ent.type);
+ tp = container_of(event, struct trace_probe, event);

if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET))
goto partial;
@@ -916,7 +947,8 @@ print_kretprobe_event(struct trace_iterator *iter, int flags)
goto partial;

for (i = 0; i < field->nargs; i++)
- if (!trace_seq_printf(s, " 0x%lx", field->args[i]))
+ if (!trace_seq_printf(s, " %s=%lx",
+ tp->args[i].name, field->args[i]))
goto partial;

if (!trace_seq_puts(s, "\n"))
@@ -972,7 +1004,6 @@ static int kprobe_event_define_fields(struct ftrace_event_call *event_call)
{
int ret, i;
struct kprobe_trace_entry field;
- char buf[MAX_ARGSTR_LEN + 1];
struct trace_probe *tp = (struct trace_probe *)event_call->data;

ret = trace_define_common_fields(event_call);
@@ -981,16 +1012,9 @@ static int kprobe_event_define_fields(struct ftrace_event_call *event_call)

DEFINE_FIELD(unsigned long, ip, "ip", 0);
DEFINE_FIELD(int, nargs, "nargs", 1);
- for (i = 0; i < tp->nr_args; i++) {
- /* Set argN as a field */
- sprintf(buf, "arg%d", i);
- DEFINE_FIELD(unsigned long, args[i], buf, 0);
- /* Set argument string as an alias field */
- ret = probe_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i]);
- if (ret < 0)
- return ret;
- DEFINE_FIELD(unsigned long, args[i], buf, 0);
- }
+ /* Set argument names as fields */
+ for (i = 0; i < tp->nr_args; i++)
+ DEFINE_FIELD(unsigned long, args[i], tp->args[i].name, 0);
return 0;
}

@@ -998,7 +1022,6 @@ static int kretprobe_event_define_fields(struct ftrace_event_call *event_call)
{
int ret, i;
struct kretprobe_trace_entry field;
- char buf[MAX_ARGSTR_LEN + 1];
struct trace_probe *tp = (struct trace_probe *)event_call->data;

ret = trace_define_common_fields(event_call);
@@ -1008,16 +1031,9 @@ static int kretprobe_event_define_fields(struct ftrace_event_call *event_call)
DEFINE_FIELD(unsigned long, func, "func", 0);
DEFINE_FIELD(unsigned long, ret_ip, "ret_ip", 0);
DEFINE_FIELD(int, nargs, "nargs", 1);
- for (i = 0; i < tp->nr_args; i++) {
- /* Set argN as a field */
- sprintf(buf, "arg%d", i);
- DEFINE_FIELD(unsigned long, args[i], buf, 0);
- /* Set argument string as an alias field */
- ret = probe_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i]);
- if (ret < 0)
- return ret;
- DEFINE_FIELD(unsigned long, args[i], buf, 0);
- }
+ /* Set argument names as fields */
+ for (i = 0; i < tp->nr_args; i++)
+ DEFINE_FIELD(unsigned long, args[i], tp->args[i].name, 0);
return 0;
}

@@ -1025,31 +1041,21 @@ static int __probe_event_show_format(struct trace_seq *s,
struct trace_probe *tp, const char *fmt,
const char *arg)
{
- int i, ret;
- char buf[MAX_ARGSTR_LEN + 1];
+ int i;

- /* Show aliases */
- for (i = 0; i < tp->nr_args; i++) {
- ret = probe_arg_string(buf, MAX_ARGSTR_LEN, &tp->args[i]);
- if (ret < 0)
- return ret;
- if (!trace_seq_printf(s, "\talias: %s;\toriginal: arg%d;\n",
- buf, i))
- return 0;
- }
/* Show format */
if (!trace_seq_printf(s, "\nprint fmt: \"%s", fmt))
return 0;

for (i = 0; i < tp->nr_args; i++)
- if (!trace_seq_puts(s, " 0x%lx"))
+ if (!trace_seq_printf(s, " %s=%%lx", tp->args[i].name))
return 0;

if (!trace_seq_printf(s, "\", %s", arg))
return 0;

for (i = 0; i < tp->nr_args; i++)
- if (!trace_seq_printf(s, ", arg%d", i))
+ if (!trace_seq_printf(s, ", REC->%s", tp->args[i].name))
return 0;

return trace_seq_puts(s, "\n");
@@ -1071,17 +1077,14 @@ static int kprobe_event_show_format(struct ftrace_event_call *call,
{
struct kprobe_trace_entry field __attribute__((unused));
int ret, i;
- char buf[8];
struct trace_probe *tp = (struct trace_probe *)call->data;

SHOW_FIELD(unsigned long, ip, "ip");
SHOW_FIELD(int, nargs, "nargs");

/* Show fields */
- for (i = 0; i < tp->nr_args; i++) {
- sprintf(buf, "arg%d", i);
- SHOW_FIELD(unsigned long, args[i], buf);
- }
+ for (i = 0; i < tp->nr_args; i++)
+ SHOW_FIELD(unsigned long, args[i], tp->args[i].name);
trace_seq_puts(s, "\n");

return __probe_event_show_format(s, tp, "%lx:", "ip");
@@ -1092,7 +1095,6 @@ static int kretprobe_event_show_format(struct ftrace_event_call *call,
{
struct kretprobe_trace_entry field __attribute__((unused));
int ret, i;
- char buf[8];
struct trace_probe *tp = (struct trace_probe *)call->data;

SHOW_FIELD(unsigned long, func, "func");
@@ -1100,10 +1102,8 @@ static int kretprobe_event_show_format(struct ftrace_event_call *call,
SHOW_FIELD(int, nargs, "nargs");

/* Show fields */
- for (i = 0; i < tp->nr_args; i++) {
- sprintf(buf, "arg%d", i);
- SHOW_FIELD(unsigned long, args[i], buf);
- }
+ for (i = 0; i < tp->nr_args; i++)
+ SHOW_FIELD(unsigned long, args[i], tp->args[i].name);
trace_seq_puts(s, "\n");

return __probe_event_show_format(s, tp, "%lx <- %lx:",
@@ -1140,7 +1140,7 @@ static __kprobes int kprobe_profile_func(struct kprobe *kp,
entry->nargs = tp->nr_args;
entry->ip = (unsigned long)kp->addr;
for (i = 0; i < tp->nr_args; i++)
- entry->args[i] = call_fetch(&tp->args[i], regs);
+ entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
perf_tpcounter_event(call->id, entry->ip, 1, entry, size);
} while (0);
return 0;
@@ -1175,7 +1175,7 @@ static __kprobes int kretprobe_profile_func(struct kretprobe_instance *ri,
entry->func = (unsigned long)tp->rp.kp.addr;
entry->ret_ip = (unsigned long)ri->ret_addr;
for (i = 0; i < tp->nr_args; i++)
- entry->args[i] = call_fetch(&tp->args[i], regs);
+ entry->args[i] = call_fetch(&tp->args[i].fetch, regs);
perf_tpcounter_event(call->id, entry->ret_ip, 1, entry, size);
} while (0);
return 0;
--
1.6.2.3

2009-09-22 19:41:06

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 16/24] tracing/kprobes: Show event name in trace output

From: Masami Hiramatsu <[email protected]>

Show event name in tracing/trace output. This also fixes kprobes events
format to comply with other tracepoint events formats.

Before patching:
<...>-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: ...
<...>-1447 [001] 1038282.286878: sys_openat+0xc/0xe <- do_sys_open: ...

After patching:
<...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) ...
<...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) ...

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Documentation/trace/kprobetrace.txt | 16 ++++++++--------
kernel/trace/trace_kprobe.c | 16 +++++++++++-----
2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index aaa6c10..a849889 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -114,7 +114,7 @@ format:
field: unsigned long flags; offset:48;tsize:8;
field: unsigned long mode; offset:56;tsize:8;

-print fmt: "%lx: dfd=%lx filename=%lx flags=%lx mode=%lx", ip, REC->dfd, REC->filename, REC->flags, REC->mode
+print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, REC->filename, REC->flags, REC->mode


You can see that the event has 4 arguments as in the expressions you specified.
@@ -129,15 +129,15 @@ print fmt: "%lx: dfd=%lx filename=%lx flags=%lx mode=%lx", ip, REC->dfd, REC->fi
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
- <...>-1447 [001] 1038282.286875: do_sys_open+0x0/0xd6: dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
- <...>-1447 [001] 1038282.286878: sys_openat+0xc/0xe <- do_sys_open: rv=fffffffffffffffe ra=ffffffff81367a3a
- <...>-1447 [001] 1038282.286885: do_sys_open+0x0/0xd6: dfd=ffffff9c filename=40413c flags=8000 mode=1b6
- <...>-1447 [001] 1038282.286915: sys_open+0x1b/0x1d <- do_sys_open: rv=3 ra=ffffffff81367a3a
- <...>-1447 [001] 1038282.286969: do_sys_open+0x0/0xd6: dfd=ffffff9c filename=4041c6 flags=98800 mode=10
- <...>-1447 [001] 1038282.286976: sys_open+0x1b/0x1d <- do_sys_open: rv=3 ra=ffffffff81367a3a
+ <...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
+ <...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) rv=fffffffffffffffe ra=ffffffff81367a3a
+ <...>-1447 [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
+ <...>-1447 [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) rv=3 ra=ffffffff81367a3a
+ <...>-1447 [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
+ <...>-1447 [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) rv=3 ra=ffffffff81367a3a


- Each line shows when the kernel hits a probe, and <- SYMBOL means kernel
+ Each line shows when the kernel hits an event, and <- SYMBOL means kernel
returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
returns from do_sys_open to sys_open+0x1b).

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 44dad1a..1746afe 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -902,10 +902,13 @@ print_kprobe_event(struct trace_iterator *iter, int flags)
event = ftrace_find_event(field->ent.type);
tp = container_of(event, struct trace_probe, event);

+ if (!trace_seq_printf(s, "%s: (", tp->call.name))
+ goto partial;
+
if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
goto partial;

- if (!trace_seq_puts(s, ":"))
+ if (!trace_seq_puts(s, ")"))
goto partial;

for (i = 0; i < field->nargs; i++)
@@ -934,6 +937,9 @@ print_kretprobe_event(struct trace_iterator *iter, int flags)
event = ftrace_find_event(field->ent.type);
tp = container_of(event, struct trace_probe, event);

+ if (!trace_seq_printf(s, "%s: (", tp->call.name))
+ goto partial;
+
if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET))
goto partial;

@@ -943,7 +949,7 @@ print_kretprobe_event(struct trace_iterator *iter, int flags)
if (!seq_print_ip_sym(s, field->func, flags & ~TRACE_ITER_SYM_OFFSET))
goto partial;

- if (!trace_seq_puts(s, ":"))
+ if (!trace_seq_puts(s, ")"))
goto partial;

for (i = 0; i < field->nargs; i++)
@@ -1087,7 +1093,7 @@ static int kprobe_event_show_format(struct ftrace_event_call *call,
SHOW_FIELD(unsigned long, args[i], tp->args[i].name);
trace_seq_puts(s, "\n");

- return __probe_event_show_format(s, tp, "%lx:", "ip");
+ return __probe_event_show_format(s, tp, "(%lx)", "REC->ip");
}

static int kretprobe_event_show_format(struct ftrace_event_call *call,
@@ -1106,8 +1112,8 @@ static int kretprobe_event_show_format(struct ftrace_event_call *call,
SHOW_FIELD(unsigned long, args[i], tp->args[i].name);
trace_seq_puts(s, "\n");

- return __probe_event_show_format(s, tp, "%lx <- %lx:",
- "func, ret_ip");
+ return __probe_event_show_format(s, tp, "(%lx <- %lx)",
+ "REC->func, REC->ret_ip");
}

#ifdef CONFIG_EVENT_PROFILE
--
1.6.2.3

2009-09-22 19:39:15

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 17/24] tracing/kprobes: Support custom subsystem for each kprobe event

From: Masami Hiramatsu <[email protected]>

Support specifying a custom subsystem(group) for each kprobe event.
This allows users to create new group to control several probes
at once, or add events to existing groups as additional tracepoints.

New synopsis:
p[:[subsys/]event-name] KADDR|KSYM[+offs] [ARGS]

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Documentation/trace/kprobetrace.txt | 5 +++--
kernel/trace/trace_kprobe.c | 33 +++++++++++++++++++++++++++------
2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index a849889..6521681 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -25,9 +25,10 @@ probe events via /sys/kernel/debug/tracing/events/kprobes/<EVENT>/filter.

Synopsis of kprobe_events
-------------------------
- p[:EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe
- r[:EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
+ p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe
+ r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe

+ GRP : Group name. If omitted, use "kprobes" for it.
EVENT : Event name. If omitted, the event name is generated
based on SYMBOL+offs or MEMADDR.
SYMBOL[+offs] : Symbol+offset where the probe is inserted.
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1746afe..cbc0870 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -36,6 +36,7 @@
#define MAX_TRACE_ARGS 128
#define MAX_ARGSTR_LEN 63
#define MAX_EVENT_NAME_LEN 64
+#define KPROBE_EVENT_SYSTEM "kprobes"

/* currently, trace_kprobe only supports X86. */

@@ -265,7 +266,8 @@ static LIST_HEAD(probe_list);
/*
* Allocate new trace_probe and initialize it (including kprobes).
*/
-static struct trace_probe *alloc_trace_probe(const char *event,
+static struct trace_probe *alloc_trace_probe(const char *group,
+ const char *event,
void *addr,
const char *symbol,
unsigned long offs,
@@ -298,9 +300,16 @@ static struct trace_probe *alloc_trace_probe(const char *event,
if (!tp->call.name)
goto error;

+ if (!group)
+ goto error;
+ tp->call.system = kstrdup(group, GFP_KERNEL);
+ if (!tp->call.system)
+ goto error;
+
INIT_LIST_HEAD(&tp->list);
return tp;
error:
+ kfree(tp->call.name);
kfree(tp->symbol);
kfree(tp);
return ERR_PTR(-ENOMEM);
@@ -322,6 +331,7 @@ static void free_trace_probe(struct trace_probe *tp)
for (i = 0; i < tp->nr_args; i++)
free_probe_arg(&tp->args[i]);

+ kfree(tp->call.system);
kfree(tp->call.name);
kfree(tp->symbol);
kfree(tp);
@@ -530,8 +540,8 @@ static int create_trace_probe(int argc, char **argv)
{
/*
* Argument syntax:
- * - Add kprobe: p[:EVENT] SYMBOL[+OFFS]|ADDRESS [FETCHARGS]
- * - Add kretprobe: r[:EVENT] SYMBOL[+0] [FETCHARGS]
+ * - Add kprobe: p[:[GRP/]EVENT] KSYM[+OFFS]|KADDR [FETCHARGS]
+ * - Add kretprobe: r[:[GRP/]EVENT] KSYM[+0] [FETCHARGS]
* Fetch args:
* aN : fetch Nth of function argument. (N:0-)
* rv : fetch return value
@@ -549,7 +559,7 @@ static int create_trace_probe(int argc, char **argv)
struct trace_probe *tp;
int i, ret = 0;
int is_return = 0;
- char *symbol = NULL, *event = NULL, *arg = NULL;
+ char *symbol = NULL, *event = NULL, *arg = NULL, *group = NULL;
unsigned long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
@@ -566,6 +576,15 @@ static int create_trace_probe(int argc, char **argv)

if (argv[0][1] == ':') {
event = &argv[0][2];
+ if (strchr(event, '/')) {
+ group = event;
+ event = strchr(group, '/') + 1;
+ event[-1] = '\0';
+ if (strlen(group) == 0) {
+ pr_info("Group name is not specifiled\n");
+ return -EINVAL;
+ }
+ }
if (strlen(event) == 0) {
pr_info("Event name is not specifiled\n");
return -EINVAL;
@@ -592,6 +611,8 @@ static int create_trace_probe(int argc, char **argv)
argc -= 2; argv += 2;

/* setup a probe */
+ if (!group)
+ group = KPROBE_EVENT_SYSTEM;
if (!event) {
/* Make a new event name */
if (symbol)
@@ -602,7 +623,8 @@ static int create_trace_probe(int argc, char **argv)
is_return ? 'r' : 'p', addr);
event = buf;
}
- tp = alloc_trace_probe(event, addr, symbol, offset, argc, is_return);
+ tp = alloc_trace_probe(group, event, addr, symbol, offset, argc,
+ is_return);
if (IS_ERR(tp))
return PTR_ERR(tp);

@@ -1217,7 +1239,6 @@ static int register_probe_event(struct trace_probe *tp)
int ret;

/* Initialize ftrace_event_call */
- call->system = "kprobes";
if (probe_is_return(tp)) {
tp->event.trace = print_kretprobe_event;
call->raw_init = probe_event_raw_init;
--
1.6.2.3

2009-09-22 19:39:18

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 18/24] tracing/kprobes: Fix trace_probe registration order

From: Masami Hiramatsu <[email protected]>

Fix trace_probe registration order. ftrace_event_call and ftrace_event
must be registered before kprobe/kretprobe, because tracing/profiling
handlers dereference the event-id.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace_kprobe.c | 42 +++++++++++++++++++-----------------------
1 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index cbc0870..ea0db8e 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -347,20 +347,15 @@ static struct trace_probe *find_probe_event(const char *event)
return NULL;
}

-static void __unregister_trace_probe(struct trace_probe *tp)
+/* Unregister a trace_probe and probe_event: call with locking probe_lock */
+static void unregister_trace_probe(struct trace_probe *tp)
{
if (probe_is_return(tp))
unregister_kretprobe(&tp->rp);
else
unregister_kprobe(&tp->rp.kp);
-}
-
-/* Unregister a trace_probe and probe_event: call with locking probe_lock */
-static void unregister_trace_probe(struct trace_probe *tp)
-{
- unregister_probe_event(tp);
- __unregister_trace_probe(tp);
list_del(&tp->list);
+ unregister_probe_event(tp);
}

/* Register a trace_probe and probe_event */
@@ -371,6 +366,19 @@ static int register_trace_probe(struct trace_probe *tp)

mutex_lock(&probe_lock);

+ /* register as an event */
+ old_tp = find_probe_event(tp->call.name);
+ if (old_tp) {
+ /* delete old event */
+ unregister_trace_probe(old_tp);
+ free_trace_probe(old_tp);
+ }
+ ret = register_probe_event(tp);
+ if (ret) {
+ pr_warning("Faild to register probe event(%d)\n", ret);
+ goto end;
+ }
+
if (probe_is_return(tp))
ret = register_kretprobe(&tp->rp);
else
@@ -384,21 +392,9 @@ static int register_trace_probe(struct trace_probe *tp)
tp->rp.kp.addr);
ret = -EINVAL;
}
- goto end;
- }
- /* register as an event */
- old_tp = find_probe_event(tp->call.name);
- if (old_tp) {
- /* delete old event */
- unregister_trace_probe(old_tp);
- free_trace_probe(old_tp);
- }
- ret = register_probe_event(tp);
- if (ret) {
- pr_warning("Faild to register probe event(%d)\n", ret);
- __unregister_trace_probe(tp);
- }
- list_add_tail(&tp->list, &probe_list);
+ unregister_probe_event(tp);
+ } else
+ list_add_tail(&tp->list, &probe_list);
end:
mutex_unlock(&probe_lock);
return ret;
--
1.6.2.3

2009-09-22 19:39:26

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 19/24] ftrace: Fix trace_add_event_call() to initialize list

From: Masami Hiramatsu <[email protected]>

Handle failure path in trace_add_event_call() to fix the below bug
which occurred when I tried to add invalid event twice.

Could not create debugfs 'kmalloc' directory
Failed to register kprobe event: kmalloc
Faild to register probe event(-1)
------------[ cut here ]------------
WARNING: at /home/mhiramat/ksrc/random-tracing/lib/list_debug.c:26
__list_add+0x27/0x5c()
Hardware name:
list_add corruption. next->prev should be prev (c07d78cc), but was
00001000. (next=d854236c).
Modules linked in: sunrpc uinput virtio_net virtio_balloon i2c_piix4 pcspkr
i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded:
scsi_wait_scan]
Pid: 1394, comm: tee Not tainted 2.6.31-rc9 #51
Call Trace:
[<c0438424>] warn_slowpath_common+0x65/0x7c
[<c05371b3>] ? __list_add+0x27/0x5c
[<c043846f>] warn_slowpath_fmt+0x24/0x27
[<c05371b3>] __list_add+0x27/0x5c
[<c047f050>] list_add+0xa/0xc
[<c047f8f5>] trace_add_event_call+0x60/0x97
[<c0483133>] command_trace_probe+0x42c/0x51b
[<c044a1b3>] ? remove_wait_queue+0x22/0x27
[<c042a9c0>] ? __wake_up+0x32/0x3b
[<c04832f6>] probes_write+0xd4/0x10a
[<c0483222>] ? probes_write+0x0/0x10a
[<c04b27a9>] vfs_write+0x80/0xdf
[<c04b289c>] sys_write+0x3b/0x5d
[<c0670d41>] syscall_call+0x7/0xb
---[ end trace 2b962b5dc1fdc07d ]---

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace_events.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index ba34920..83cc2c0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1010,9 +1010,12 @@ static int __trace_add_event_call(struct ftrace_event_call *call)
return -ENOENT;

list_add(&call->list, &ftrace_events);
- return event_create_dir(call, d_events, &ftrace_event_id_fops,
+ ret = event_create_dir(call, d_events, &ftrace_event_id_fops,
&ftrace_enable_fops, &ftrace_event_filter_fops,
&ftrace_event_format_fops);
+ if (ret < 0)
+ list_del(&call->list);
+ return ret;
}

/* Add an additional event_call dynamically */
--
1.6.2.3

2009-09-22 19:39:31

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 20/24] ftrace: Fix trace_remove_event_call() to lock trace_event_mutex

From: Masami Hiramatsu <[email protected]>

Lock not only event_mutex but also trace_event_mutex in
trace_remove_event_call() to protect __unregister_ftrace_event().

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace_events.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 83cc2c0..f85b0f1 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1054,6 +1054,9 @@ static void remove_subsystem_dir(const char *name)
}
}

+/*
+ * Must be called under locking both of event_mutex and trace_event_mutex.
+ */
static void __trace_remove_event_call(struct ftrace_event_call *call)
{
ftrace_event_enable_disable(call, 0);
@@ -1070,7 +1073,9 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
void trace_remove_event_call(struct ftrace_event_call *call)
{
mutex_lock(&event_mutex);
+ down_write(&trace_event_mutex);
__trace_remove_event_call(call);
+ up_write(&trace_event_mutex);
mutex_unlock(&event_mutex);
}

--
1.6.2.3

2009-09-22 19:39:32

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 21/24] tracing/kprobes: Add probe handler dispatcher to support perf and ftrace concurrent use

From: Masami Hiramatsu <[email protected]>

Add kprobe_dispatcher and kretprobe_dispatcher to dispatch event
in both profile and tracing handlers.

This allows simultaneous kprobe uses by ftrace and perf.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace_kprobe.c | 85 ++++++++++++++++++++++++++++++++-----------
1 files changed, 63 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index ea0db8e..70b632c 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -185,10 +185,15 @@ struct probe_arg {
const char *name;
};

+/* Flags for trace_probe */
+#define TP_FLAG_TRACE 1
+#define TP_FLAG_PROFILE 2
+
struct trace_probe {
struct list_head list;
struct kretprobe rp; /* Use rp.kp for kprobe use */
unsigned long nhit;
+ unsigned int flags; /* For TP_FLAG_* */
const char *symbol; /* symbol name */
struct ftrace_event_call call;
struct trace_event event;
@@ -200,10 +205,6 @@ struct trace_probe {
(offsetof(struct trace_probe, args) + \
(sizeof(struct probe_arg) * (n)))

-static int kprobe_trace_func(struct kprobe *kp, struct pt_regs *regs);
-static int kretprobe_trace_func(struct kretprobe_instance *ri,
- struct pt_regs *regs);
-
static __kprobes int probe_is_return(struct trace_probe *tp)
{
return tp->rp.handler != NULL;
@@ -263,6 +264,10 @@ static void unregister_probe_event(struct trace_probe *tp);
static DEFINE_MUTEX(probe_lock);
static LIST_HEAD(probe_list);

+static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs);
+static int kretprobe_dispatcher(struct kretprobe_instance *ri,
+ struct pt_regs *regs);
+
/*
* Allocate new trace_probe and initialize it (including kprobes).
*/
@@ -288,11 +293,10 @@ static struct trace_probe *alloc_trace_probe(const char *group,
} else
tp->rp.kp.addr = addr;

- /* Set handler here for checking whether this probe is return or not. */
if (is_return)
- tp->rp.handler = kretprobe_trace_func;
+ tp->rp.handler = kretprobe_dispatcher;
else
- tp->rp.kp.pre_handler = kprobe_trace_func;
+ tp->rp.kp.pre_handler = kprobe_dispatcher;

if (!event)
goto error;
@@ -379,6 +383,7 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}

+ tp->flags = TP_FLAG_TRACE;
if (probe_is_return(tp))
ret = register_kretprobe(&tp->rp);
else
@@ -987,23 +992,24 @@ static int probe_event_enable(struct ftrace_event_call *call)
{
struct trace_probe *tp = (struct trace_probe *)call->data;

- if (probe_is_return(tp)) {
- tp->rp.handler = kretprobe_trace_func;
+ tp->flags |= TP_FLAG_TRACE;
+ if (probe_is_return(tp))
return enable_kretprobe(&tp->rp);
- } else {
- tp->rp.kp.pre_handler = kprobe_trace_func;
+ else
return enable_kprobe(&tp->rp.kp);
- }
}

static void probe_event_disable(struct ftrace_event_call *call)
{
struct trace_probe *tp = (struct trace_probe *)call->data;

- if (probe_is_return(tp))
- disable_kretprobe(&tp->rp);
- else
- disable_kprobe(&tp->rp.kp);
+ tp->flags &= ~TP_FLAG_TRACE;
+ if (!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE))) {
+ if (probe_is_return(tp))
+ disable_kretprobe(&tp->rp);
+ else
+ disable_kprobe(&tp->rp.kp);
+ }
}

static int probe_event_raw_init(struct ftrace_event_call *event_call)
@@ -1212,22 +1218,57 @@ static int probe_profile_enable(struct ftrace_event_call *call)
if (atomic_inc_return(&call->profile_count))
return 0;

- if (probe_is_return(tp)) {
- tp->rp.handler = kretprobe_profile_func;
+ tp->flags |= TP_FLAG_PROFILE;
+ if (probe_is_return(tp))
return enable_kretprobe(&tp->rp);
- } else {
- tp->rp.kp.pre_handler = kprobe_profile_func;
+ else
return enable_kprobe(&tp->rp.kp);
- }
}

static void probe_profile_disable(struct ftrace_event_call *call)
{
+ struct trace_probe *tp = (struct trace_probe *)call->data;
+
if (atomic_add_negative(-1, &call->profile_count))
- probe_event_disable(call);
+ tp->flags &= ~TP_FLAG_PROFILE;
+
+ if (!(tp->flags & (TP_FLAG_TRACE | TP_FLAG_PROFILE))) {
+ if (probe_is_return(tp))
+ disable_kretprobe(&tp->rp);
+ else
+ disable_kprobe(&tp->rp.kp);
+ }
}
+#endif /* CONFIG_EVENT_PROFILE */
+
+
+static __kprobes
+int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
+{
+ struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);

+ if (tp->flags & TP_FLAG_TRACE)
+ kprobe_trace_func(kp, regs);
+#ifdef CONFIG_EVENT_PROFILE
+ if (tp->flags & TP_FLAG_PROFILE)
+ kprobe_profile_func(kp, regs);
#endif /* CONFIG_EVENT_PROFILE */
+ return 0; /* We don't tweek kernel, so just return 0 */
+}
+
+static __kprobes
+int kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
+{
+ struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
+
+ if (tp->flags & TP_FLAG_TRACE)
+ kretprobe_trace_func(ri, regs);
+#ifdef CONFIG_EVENT_PROFILE
+ if (tp->flags & TP_FLAG_PROFILE)
+ kretprobe_profile_func(ri, regs);
+#endif /* CONFIG_EVENT_PROFILE */
+ return 0; /* We don't tweek kernel, so just return 0 */
+}

static int register_probe_event(struct trace_probe *tp)
{
--
1.6.2.3

2009-09-22 19:40:12

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 22/24] tracing/kprobes: Fix profiling alignment for perf_counter buffer

From: Masami Hiramatsu <[email protected]>

Fix *probe_profile_func() to align buffer size, since perf_counter
requires its buffer entries to be 8 bytes aligned.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/trace/trace_kprobe.c | 17 ++++++++++++-----
1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 70b632c..d8db935 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1149,18 +1149,23 @@ static __kprobes int kprobe_profile_func(struct kprobe *kp,
struct trace_probe *tp = container_of(kp, struct trace_probe, rp.kp);
struct ftrace_event_call *call = &tp->call;
struct kprobe_trace_entry *entry;
- int size, i, pc;
+ int size, __size, i, pc;
unsigned long irq_flags;

local_save_flags(irq_flags);
pc = preempt_count();

- size = SIZEOF_KPROBE_TRACE_ENTRY(tp->nr_args);
+ __size = SIZEOF_KPROBE_TRACE_ENTRY(tp->nr_args);
+ size = ALIGN(__size + sizeof(u32), sizeof(u64));
+ size -= sizeof(u32);

do {
char raw_data[size];
struct trace_entry *ent;
-
+ /*
+ * Zero dead bytes from alignment to avoid stack leak
+ * to userspace
+ */
*(u64 *)(&raw_data[size - sizeof(u64)]) = 0ULL;
entry = (struct kprobe_trace_entry *)raw_data;
ent = &entry->ent;
@@ -1183,13 +1188,15 @@ static __kprobes int kretprobe_profile_func(struct kretprobe_instance *ri,
struct trace_probe *tp = container_of(ri->rp, struct trace_probe, rp);
struct ftrace_event_call *call = &tp->call;
struct kretprobe_trace_entry *entry;
- int size, i, pc;
+ int size, __size, i, pc;
unsigned long irq_flags;

local_save_flags(irq_flags);
pc = preempt_count();

- size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp->nr_args);
+ __size = SIZEOF_KRETPROBE_TRACE_ENTRY(tp->nr_args);
+ size = ALIGN(__size + sizeof(u32), sizeof(u64));
+ size -= sizeof(u32);

do {
char raw_data[size];
--
1.6.2.3

2009-09-22 19:39:40

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 23/24] tracing/kprobes: Disable kprobe events by default after creation

From: Masami Hiramatsu <[email protected]>

Disable newly created kprobe events by default, not to disturb
another user using ftrace. "Disturb" means when someone is using
ftrace and another user tries to use perf-tools, (in near
future) if he defines new kprobe event via perf-tools, then new
events will mess up the frace buffer. Fix this to allow proper
and transparent kprobes events concurrent usage between ftrace
users and perf users.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
Documentation/trace/kprobetrace.txt | 11 +++++++++--
kernel/trace/trace_kprobe.c | 4 ++--
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 6521681..9b8f7c6 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -122,8 +122,15 @@ print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, R

echo > /sys/kernel/debug/tracing/kprobe_events

- This clears all probe points. and you can see the traced information via
-/sys/kernel/debug/tracing/trace.
+ This clears all probe points.
+
+ Right after definition, each event is disabled by default. For tracing these
+events, you need to enable it.
+
+ echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
+ echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
+
+ And you can see the traced information via /sys/kernel/debug/tracing/trace.

cat /sys/kernel/debug/tracing/trace
# tracer: nop
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d8db935..f6821f1 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -383,7 +383,7 @@ static int register_trace_probe(struct trace_probe *tp)
goto end;
}

- tp->flags = TP_FLAG_TRACE;
+ tp->rp.kp.flags |= KPROBE_FLAG_DISABLED;
if (probe_is_return(tp))
ret = register_kretprobe(&tp->rp);
else
@@ -1298,7 +1298,7 @@ static int register_probe_event(struct trace_probe *tp)
call->id = register_ftrace_event(&tp->event);
if (!call->id)
return -ENODEV;
- call->enabled = 1;
+ call->enabled = 0;
call->regfunc = probe_event_enable;
call->unregfunc = probe_event_disable;

--
1.6.2.3

2009-09-22 19:39:52

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 24/24] kprobes: Prevent re-registration of the same kprobe

From: Ananth N Mavinakayanahalli <[email protected]>

Prevent re-registration of the same kprobe. This situation, though
unlikely, needs to be flagged since it can lead to a system crash if
it's not handled.

The core change itself is small, but the helper routine needed to be
moved around a bit; hence the diffstat.

Signed-off-by: Ananth N Mavinakayanahalli<[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tom Zanussi <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/kprobes.c | 58 +++++++++++++++++++++++++++++++++++------------------
1 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 00d01b0..b946761 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -676,6 +676,40 @@ static kprobe_opcode_t __kprobes *kprobe_addr(struct kprobe *p)
return (kprobe_opcode_t *)(((char *)addr) + p->offset);
}

+/* Check passed kprobe is valid and return kprobe in kprobe_table. */
+static struct kprobe * __kprobes __get_valid_kprobe(struct kprobe *p)
+{
+ struct kprobe *old_p, *list_p;
+
+ old_p = get_kprobe(p->addr);
+ if (unlikely(!old_p))
+ return NULL;
+
+ if (p != old_p) {
+ list_for_each_entry_rcu(list_p, &old_p->list, list)
+ if (list_p == p)
+ /* kprobe p is a valid probe */
+ goto valid;
+ return NULL;
+ }
+valid:
+ return old_p;
+}
+
+/* Return error if the kprobe is being re-registered */
+static inline int check_kprobe_rereg(struct kprobe *p)
+{
+ int ret = 0;
+ struct kprobe *old_p;
+
+ mutex_lock(&kprobe_mutex);
+ old_p = __get_valid_kprobe(p);
+ if (old_p)
+ ret = -EINVAL;
+ mutex_unlock(&kprobe_mutex);
+ return ret;
+}
+
int __kprobes register_kprobe(struct kprobe *p)
{
int ret = 0;
@@ -688,6 +722,10 @@ int __kprobes register_kprobe(struct kprobe *p)
return -EINVAL;
p->addr = addr;

+ ret = check_kprobe_rereg(p);
+ if (ret)
+ return ret;
+
preempt_disable();
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr)) {
@@ -757,26 +795,6 @@ out:
}
EXPORT_SYMBOL_GPL(register_kprobe);

-/* Check passed kprobe is valid and return kprobe in kprobe_table. */
-static struct kprobe * __kprobes __get_valid_kprobe(struct kprobe *p)
-{
- struct kprobe *old_p, *list_p;
-
- old_p = get_kprobe(p->addr);
- if (unlikely(!old_p))
- return NULL;
-
- if (p != old_p) {
- list_for_each_entry_rcu(list_p, &old_p->list, list)
- if (list_p == p)
- /* kprobe p is a valid probe */
- goto valid;
- return NULL;
- }
-valid:
- return old_p;
-}
-
/*
* Unregister a kprobe without a scheduler synchronization.
*/
--
1.6.2.3

2009-09-23 01:13:13

by Li Zefan

[permalink] [raw]
Subject: Re: [PATCH 19/24] ftrace: Fix trace_add_event_call() to initialize list

> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index ba34920..83cc2c0 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1010,9 +1010,12 @@ static int __trace_add_event_call(struct ftrace_event_call *call)
> return -ENOENT;
>
> list_add(&call->list, &ftrace_events);
> - return event_create_dir(call, d_events, &ftrace_event_id_fops,
> + ret = event_create_dir(call, d_events, &ftrace_event_id_fops,
> &ftrace_enable_fops, &ftrace_event_filter_fops,
> &ftrace_event_format_fops);
> + if (ret < 0)
> + list_del(&call->list);
> + return ret;

seems it's a bit better to call list_add() after event_create_dir()
returns 0.

> }
>
> /* Add an additional event_call dynamically */

2009-09-23 08:17:33

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 19/24] ftrace: Fix trace_add_event_call() to initialize list

Li Zefan wrote:
>> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
>> index ba34920..83cc2c0 100644
>> --- a/kernel/trace/trace_events.c
>> +++ b/kernel/trace/trace_events.c
>> @@ -1010,9 +1010,12 @@ static int __trace_add_event_call(struct ftrace_event_call *call)
>> return -ENOENT;
>>
>> list_add(&call->list, &ftrace_events);
>> - return event_create_dir(call, d_events, &ftrace_event_id_fops,
>> + ret = event_create_dir(call, d_events, &ftrace_event_id_fops,
>> &ftrace_enable_fops, &ftrace_event_filter_fops,
>> &ftrace_event_format_fops);
>> + if (ret < 0)
>> + list_del(&call->list);
>> + return ret;
>
> seems it's a bit better to call list_add() after event_create_dir()
> returns 0.

Sure, that's another way to do. But I'm afraid that will make a difference
from trace_module_add_events() path.

---
call->mod = mod;
list_add(&call->list, &ftrace_events);
event_create_dir(call, d_events,
&file_ops->id, &file_ops->enable,
&file_ops->filter, &file_ops->format);
---
Anyway, this also needs to check the result of event_create_dir().

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-09-23 10:53:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates


* Frederic Weisbecker <[email protected]> wrote:

> Hi Ingo,
>
> Kprobes has been nicely improved lately. The x86 instruction decoder
> has been fixed to support cross builds and mmx instruction set,
> besides of a lot of various kprobes core fixes.
>
> The tracing part has evolved too, we can define human names for
> arguments and custom subsystem names for dynamic tracepoints.
>
> And also kprobes profiling and raw dynamic tracepoint samples are now
> supported through perf. Looks like most of the kernel parts are now in
> place for a perf support. Things are going to be focused on a perf
> kprobes tool to exploit that.

Nice progress. What's the expected timeline of exhaustive tools/perf/
support?

> Concerning this git tree, based on tip:/tracing/kprobes, I had to
> merge tracing/core inside few weeks ago because it needed build fixes
> that were in tracing/core (the merge commit provides the details). The
> tree is self contained but it's already async with recent upstream
> tracing updates. It means that merging upstream tree or tracing/core
> inside may result in non-trivial conflicts. I can handle them, or
> rebase the whole, as you prefer.
>
> The tree can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> tracing/kprobes

Would be nice to merge latest -git into this tree and resolve the
conflicts:

kernel/trace/Makefile
kernel/trace/trace.h
kernel/trace/trace_event_types.h
kernel/trace/trace_export.c

Then i could pull it into tip:tracing/kprobes for more testing.

Ingo

2009-09-23 12:04:24

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates

2009/9/23 Ingo Molnar <[email protected]>:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
>> Hi Ingo,
>>
>> Kprobes has been nicely improved lately. The x86 instruction decoder
>> has been fixed to support cross builds and mmx instruction set,
>> besides of a lot of various kprobes core fixes.
>>
>> The tracing part has evolved too, we can define human names for
>> arguments and custom subsystem names for dynamic tracepoints.
>>
>> And also kprobes profiling and raw dynamic tracepoint samples are now
>> supported through perf. Looks like most of the kernel parts are now in
>> place for a perf support. Things are going to be focused on a perf
>> kprobes tool to exploit that.
>
> Nice progress. What's the expected timeline of exhaustive tools/perf/
> support?


Masami is better suited to answer that, so I let him respond.


>> Concerning this git tree, based on tip:/tracing/kprobes, I had to
>> merge tracing/core inside few weeks ago because it needed build fixes
>> that were in tracing/core (the merge commit provides the details). The
>> tree is self contained but it's already async with recent upstream
>> tracing updates. It means that merging upstream tree or tracing/core
>> inside may result in non-trivial conflicts. I can handle them, or
>> rebase the whole, as you prefer.
>>
>> The tree can be found at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
>> ? ? ? tracing/kprobes
>
> Would be nice to merge latest -git into this tree and resolve the
> conflicts:
>
> ?kernel/trace/Makefile
> ?kernel/trace/trace.h
> ?kernel/trace/trace_event_types.h
> ?kernel/trace/trace_export.c
>
> Then i could pull it into tip:tracing/kprobes for more testing.


Sure, I'll do that soon.

Frederic.

2009-09-23 16:41:01

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates

Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
>> Hi Ingo,
>>
>> Kprobes has been nicely improved lately. The x86 instruction decoder
>> has been fixed to support cross builds and mmx instruction set,
>> besides of a lot of various kprobes core fixes.
>>
>> The tracing part has evolved too, we can define human names for
>> arguments and custom subsystem names for dynamic tracepoints.
>>
>> And also kprobes profiling and raw dynamic tracepoint samples are now
>> supported through perf. Looks like most of the kernel parts are now in
>> place for a perf support. Things are going to be focused on a perf
>> kprobes tool to exploit that.
>
> Nice progress. What's the expected timeline of exhaustive tools/perf/
> support?

Hi Ingo,

That's under reviewing, I'll post it as soon as possible. :-)
Now I'm considering about it's interface.
In the 1st release, I'll use -P "probe-and-arg-definition" option
which I suggested perviously, but it is also possible to use
-p "event" -a "arg" option which Frederic suggested.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-09-23 21:24:41

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates

On Wed, Sep 23, 2009 at 12:52:52PM +0200, Ingo Molnar wrote:
> Would be nice to merge latest -git into this tree and resolve the
> conflicts:
>
> kernel/trace/Makefile
> kernel/trace/trace.h
> kernel/trace/trace_event_types.h
> kernel/trace/trace_export.c
>
> Then i could pull it into tip:tracing/kprobes for more testing.
>
> Ingo


I've just merged latest upstream tree into it and pushed
that in a new branch:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
tracing/kprobes-v2

There were several conflicts, not trivial. I hope I haven't missed
something. But it boots well, I've tested a simple kprobe creation
and fetched its events through perf without any problem.

Thanks,
Frederic.

PS: I'd recommend you to define a name when you define a kprobe.
For example if you want to get the first argument of sys_open,
don't create it using:

p sys_open a0

but rather:

p:my_probe_name sys_open a0

Otherwise you will get a default kprobe name that doesn't seem
to make perf trace happy (put in my TODO list).

2009-09-23 21:46:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates


* Frederic Weisbecker <[email protected]> wrote:

> On Wed, Sep 23, 2009 at 12:52:52PM +0200, Ingo Molnar wrote:
> > Would be nice to merge latest -git into this tree and resolve the
> > conflicts:
> >
> > kernel/trace/Makefile
> > kernel/trace/trace.h
> > kernel/trace/trace_event_types.h
> > kernel/trace/trace_export.c
> >
> > Then i could pull it into tip:tracing/kprobes for more testing.
> >
> > Ingo
>
>
> I've just merged latest upstream tree into it and pushed
> that in a new branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
> tracing/kprobes-v2

Pulled into tip:tracing/kprobes, thanks Frederic!

> There were several conflicts, not trivial. I hope I haven't missed
> something. But it boots well, I've tested a simple kprobe creation
> and fetched its events through perf without any problem.
>
> Thanks,
> Frederic.
>
> PS: I'd recommend you to define a name when you define a kprobe.
> For example if you want to get the first argument of sys_open,
> don't create it using:
>
> p sys_open a0
>
> but rather:
>
> p:my_probe_name sys_open a0
>
> Otherwise you will get a default kprobe name that doesn't seem
> to make perf trace happy (put in my TODO list).

ok. Right now it's in a cooking branch, tracing/kprobes. I merged it to
tip:master - we can propagate it to tracing/core once it's ready with
all known bugs and quirks fixed and with significant perf functionality
for it.

Ingo

2009-09-23 22:14:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates


got this post-test failure with the new kprobes bits:

make[2]: `scripts/unifdef' is up to date.
TEST posttest
Error: c145cf71: f3 0f a6 d0 repz xsha256
Error: objdump says 4 bytes, but insn_get_length() says 3 (attr:0)
make[1]: *** [posttest] Error 2
make: *** [bzImage] Error 2

Config attached.

GNU objdump version 2.18.50.0.6-7.fc9 20080403
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC)

Ingo


Attachments:
(No filename) (437.00 B)
config (70.83 kB)
Download all attachments

2009-09-23 22:23:32

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates

On Thu, Sep 24, 2009 at 12:13:56AM +0200, Ingo Molnar wrote:
>
> got this post-test failure with the new kprobes bits:
>
> make[2]: `scripts/unifdef' is up to date.
> TEST posttest
> Error: c145cf71: f3 0f a6 d0 repz xsha256



Ah, xsha256 does not seem to be in the instruction table of the
decoder.

2009-09-24 00:05:12

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [GIT PULL] tracing/kprobes: Kprobes core/tracing/profiling updates

Ingo Molnar wrote:
>
> got this post-test failure with the new kprobes bits:
>
> make[2]: `scripts/unifdef' is up to date.
> TEST posttest
> Error: c145cf71: f3 0f a6 d0 repz xsha256
> Error: objdump says 4 bytes, but insn_get_length() says 3 (attr:0)
> make[1]: *** [posttest] Error 2
> make: *** [bzImage] Error 2
>
> Config attached.
>
> GNU objdump version 2.18.50.0.6-7.fc9 20080403
> gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC)

Oh, Thank you for reporting!
I'll update instruction maps.

Thank you again!

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]