2013-03-29 18:17:46

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/4] uprobes/tracing: uretprobes, initial preparations

Hello.

uretprobes code is almost ready, we need to teach trace_uprobe.c
to support them.

This looks simple, but there is a nasty complication. We do not
want to copy-and-paste the code like trace_kprobe.c does. Just look
at kprobe_event_define_fields() and kretprobe_event_define_fields().
They are non-trivial but almost identical. And there are a lot more
examples.

So I'd like to send 4/4 for review before I'll do other changes.
The patch itself doesn't make sense and complicates the source code a
bit. But note how easy we can change, say, uprobe_event_define_fields(),

- DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
- size = SIZEOF_TRACE_ENTRY(1);
+ if (!trace_probe_is_return(tu)) {
+ DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
+ size = SIZEOF_TRACE_ENTRY(1);
+ } else {
+ DEFINE_FIELD(vaddr[0], FIELD_STRING_FUNC);
+ DEFINE_FIELD(vaddr[1], FIELD_STRING_RETIP);
+ size = SIZEOF_TRACE_ENTRY(2);
+ }

without copy-and-paste. The same simple change is possible for other
helpers playing with uprobe_trace_entry_head.

In fact personally I think that trace_kprobe.c should be cleanuped
this way.



Off-topic question... Why uprobe_perf_func() passes "addr = ip" to
perf_trace_buf_submit() ? This just sets perf_sample_data->addr for
PERF_SAMPLE_ADDR, do we really need this? and we already have
perf_sample_data->ip initialized by perf. Just curious.

Oleg.

kernel/trace/trace.h | 5 ---
kernel/trace/trace_uprobe.c | 77 +++++++++++++++++++++----------------------
2 files changed, 38 insertions(+), 44 deletions(-)


2013-03-29 18:18:09

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/4] uprobes/tracing: Kill the pointless seq_print_ip_sym() call

seq_print_ip_sym(ip) in print_uprobe_event() is pointless,
kallsyms_lookup(ip) can not resolve a user-space address.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index af5b773..8e00901 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -531,13 +531,7 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
field = (struct uprobe_trace_entry_head *)iter->ent;
tu = container_of(event, struct trace_uprobe, call.event);

- if (!trace_seq_printf(s, "%s: (", tu->call.name))
- goto partial;
-
- if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
- goto partial;
-
- if (!trace_seq_puts(s, ")"))
+ if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
goto partial;

data = (u8 *)&field[1];
--
1.5.5.1

2013-03-29 18:18:16

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

struct uprobe_trace_entry_head has a single member for reporting,
"unsigned long ip". If we want to support uretprobes we need to
create another struct which has "func" and "ret_ip" and duplicate
a lot of functions, like trace_kprobe.c does.

To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
and add couple of trivial helpers to calculate sizeof/data. This
uglifies the code a bit, but this allows us to avoid a lot more
complications later, when we add the support for ret-probes.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace.h | 5 ---
kernel/trace/trace_uprobe.c | 61 ++++++++++++++++++++++++------------------
2 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 57d7e53..6ca57cf 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -103,11 +103,6 @@ struct kretprobe_trace_entry_head {
unsigned long ret_ip;
};

-struct uprobe_trace_entry_head {
- struct trace_entry ent;
- unsigned long ip;
-};
-
/*
* trace_flag_type is an enumeration that holds different
* states when a trace occurs. These are:
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 43d258d..92a4b08 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -28,6 +28,17 @@

#define UPROBE_EVENT_SYSTEM "uprobes"

+struct uprobe_trace_entry_head {
+ struct trace_entry ent;
+ unsigned long vaddr[];
+};
+
+#define SIZEOF_TRACE_ENTRY(nr) \
+ (sizeof(struct uprobe_trace_entry_head) + sizeof(unsigned long) * (nr))
+
+#define DATAOF_TRACE_ENTRY(entry, nr) \
+ ((void*)&(entry)->vaddr[nr])
+
struct trace_uprobe_filter {
rwlock_t rwlock;
int nr_systemwide;
@@ -491,20 +502,19 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
struct uprobe_trace_entry_head *entry;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
- u8 *data;
+ void *data;
int size, i;
struct ftrace_event_call *call = &tu->call;

- size = sizeof(*entry) + tu->size;
-
+ size = SIZEOF_TRACE_ENTRY(1) + tu->size;
event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
size, 0, 0);
if (!event)
return 0;

entry = ring_buffer_event_data(event);
- entry->ip = instruction_pointer(regs);
- data = (u8 *)&entry[1];
+ entry->vaddr[0] = instruction_pointer(regs);
+ data = DATAOF_TRACE_ENTRY(entry, 1);
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

@@ -518,22 +528,22 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
static enum print_line_t
print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
{
- struct uprobe_trace_entry_head *field;
+ struct uprobe_trace_entry_head *entry;
struct trace_seq *s = &iter->seq;
struct trace_uprobe *tu;
u8 *data;
int i;

- field = (struct uprobe_trace_entry_head *)iter->ent;
+ entry = (struct uprobe_trace_entry_head *)iter->ent;
tu = container_of(event, struct trace_uprobe, call.event);

- if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
+ if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
goto partial;

- data = (u8 *)&field[1];
+ data = DATAOF_TRACE_ENTRY(entry, 1);
for (i = 0; i < tu->nr_args; i++) {
if (!tu->args[i].type->print(s, tu->args[i].name,
- data + tu->args[i].offset, field))
+ data + tu->args[i].offset, entry))
goto partial;
}

@@ -585,16 +595,17 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)

static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
{
- int ret, i;
+ int ret, i, size;
struct uprobe_trace_entry_head field;
- struct trace_uprobe *tu = (struct trace_uprobe *)event_call->data;
+ struct trace_uprobe *tu = event_call->data;

- DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
+ DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
+ size = SIZEOF_TRACE_ENTRY(1);
/* Set argument names as fields */
for (i = 0; i < tu->nr_args; i++) {
ret = trace_define_field(event_call, tu->args[i].type->fmttype,
tu->args[i].name,
- sizeof(field) + tu->args[i].offset,
+ size + tu->args[i].offset,
tu->args[i].type->size,
tu->args[i].type->is_signed,
FILTER_OTHER);
@@ -748,33 +759,31 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
struct ftrace_event_call *call = &tu->call;
struct uprobe_trace_entry_head *entry;
struct hlist_head *head;
- u8 *data;
- int size, __size, i;
- int rctx;
+ unsigned long ip;
+ void *data;
+ int size, rctx, i;

if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
return UPROBE_HANDLER_REMOVE;

- __size = sizeof(*entry) + tu->size;
- size = ALIGN(__size + sizeof(u32), sizeof(u64));
- size -= sizeof(u32);
+ size = SIZEOF_TRACE_ENTRY(1);
+ size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
return 0;

preempt_disable();
-
entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
if (!entry)
goto out;

- entry->ip = instruction_pointer(regs);
- data = (u8 *)&entry[1];
+ ip = instruction_pointer(regs);
+ entry->vaddr[0] = ip;
+ data = DATAOF_TRACE_ENTRY(entry, 1);
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

head = this_cpu_ptr(call->perf_events);
- perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head, NULL);
-
+ perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
out:
preempt_enable();
return 0;
@@ -784,7 +793,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
static
int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type, void *data)
{
- struct trace_uprobe *tu = (struct trace_uprobe *)event->data;
+ struct trace_uprobe *tu = event->data;

switch (type) {
case TRACE_REG_REGISTER:
--
1.5.5.1

2013-03-29 18:18:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/4] uprobes/tracing: Kill the pointless task_pt_regs() calls

uprobe_trace_func() and uprobe_perf_func() do not need task_pt_regs(),
we already have "struct pt_regs *regs".

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8dad2a9..af5b773 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -507,7 +507,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
return 0;

entry = ring_buffer_event_data(event);
- entry->ip = instruction_pointer(task_pt_regs(current));
+ entry->ip = instruction_pointer(regs);
data = (u8 *)&entry[1];
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
@@ -777,7 +777,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
if (!entry)
goto out;

- entry->ip = instruction_pointer(task_pt_regs(current));
+ entry->ip = instruction_pointer(regs);
data = (u8 *)&entry[1];
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
--
1.5.5.1

2013-03-29 18:19:01

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

uprobe_trace_func() is never called with irqs or preemption
disabled, no need to ask preempt_count() or local_save_flags().

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8e00901..43d258d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
struct ring_buffer_event *event;
struct ring_buffer *buffer;
u8 *data;
- int size, i, pc;
- unsigned long irq_flags;
+ int size, i;
struct ftrace_event_call *call = &tu->call;

- local_save_flags(irq_flags);
- pc = preempt_count();
-
size = sizeof(*entry) + tu->size;

event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
- size, irq_flags, pc);
+ size, 0, 0);
if (!event)
return 0;

@@ -513,7 +509,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

if (!filter_current_check_discard(buffer, call, entry, event))
- trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
+ trace_buffer_unlock_commit(buffer, event, 0, 0);

return 0;
}
--
1.5.5.1

2013-04-01 16:11:04

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/6] uprobes/tracing: uretprobes

On 03/29, Oleg Nesterov wrote:
>
> uretprobes code is almost ready, we need to teach trace_uprobe.c
> to support them.
>
> This looks simple, but there is a nasty complication. We do not
> want to copy-and-paste the code like trace_kprobe.c does. Just look
> at kprobe_event_define_fields() and kretprobe_event_define_fields().
> They are non-trivial but almost identical. And there are a lot more
> examples.
>
> So I'd like to send 4/4 for review before I'll do other changes.
> The patch itself doesn't make sense and complicates the source code a
> bit. But note how easy we can change, say, uprobe_event_define_fields(),
>
> - DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
> - size = SIZEOF_TRACE_ENTRY(1);
> + if (!trace_probe_is_return(tu)) {
> + DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
> + size = SIZEOF_TRACE_ENTRY(1);
> + } else {
> + DEFINE_FIELD(vaddr[0], FIELD_STRING_FUNC);
> + DEFINE_FIELD(vaddr[1], FIELD_STRING_RETIP);
> + size = SIZEOF_TRACE_ENTRY(2);
> + }
>
> without copy-and-paste. The same simple change is possible for other
> helpers playing with uprobe_trace_entry_head.

And the rest of the necessary changes to support uretprobes.

To remind, this is on top of Anton's uretprobes implementation which is
not finished yet. This series can't be even compiled, but I think it can
be already reviewed. All it needs to compile is the new
uprobe_consumer->ret_handler().

Oleg.

kernel/trace/trace_uprobe.c | 152 +++++++++++++++++++++++++++++++++++--------
1 files changed, 124 insertions(+), 28 deletions(-)

2013-04-01 16:11:45

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/6] uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers

Extract the output code from uprobe_trace_func() and uprobe_perf_func()
into the new helpers, they will be used by ->ret_handler() too. We also
add the unused "unsigned long func" argument in advance, to simplify the
next changes.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 29 ++++++++++++++++++++---------
1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 92a4b08..2ea9961 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -496,8 +496,8 @@ static const struct file_operations uprobe_profile_ops = {
.release = seq_release,
};

-/* uprobe handler */
-static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
+static void uprobe_trace_print(struct trace_uprobe *tu,
+ unsigned long func, struct pt_regs *regs)
{
struct uprobe_trace_entry_head *entry;
struct ring_buffer_event *event;
@@ -510,7 +510,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
size, 0, 0);
if (!event)
- return 0;
+ return;

entry = ring_buffer_event_data(event);
entry->vaddr[0] = instruction_pointer(regs);
@@ -520,7 +520,12 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)

if (!filter_current_check_discard(buffer, call, entry, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
+}

+/* uprobe handler */
+static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
+{
+ uprobe_trace_print(tu, 0, regs);
return 0;
}

@@ -753,8 +758,8 @@ static bool uprobe_perf_filter(struct uprobe_consumer *uc,
return ret;
}

-/* uprobe profile handler */
-static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
+static void uprobe_perf_print(struct trace_uprobe *tu,
+ unsigned long func, struct pt_regs *regs)
{
struct ftrace_event_call *call = &tu->call;
struct uprobe_trace_entry_head *entry;
@@ -763,13 +768,10 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
void *data;
int size, rctx, i;

- if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
- return UPROBE_HANDLER_REMOVE;
-
size = SIZEOF_TRACE_ENTRY(1);
size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
- return 0;
+ return;

preempt_disable();
entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
@@ -786,6 +788,15 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
out:
preempt_enable();
+}
+
+/* uprobe profile handler */
+static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
+{
+ if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
+ return UPROBE_HANDLER_REMOVE;
+
+ uprobe_perf_print(tu, 0, regs);
return 0;
}
#endif /* CONFIG_PERF_EVENTS */
--
1.5.5.1

2013-04-01 16:11:50

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/6] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

Change uprobe_trace_print() and uprobe_perf_print() to check
is_ret_probe() and fill ring_buffer_event accordingly.

Also change uprobe_trace_func() and uprobe_perf_func() to not
_print() if is_ret_probe() is true. Note that we keep ->handler()
nontrivial even for uretprobe, we need this for filtering and for
other potential extensions.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 42 +++++++++++++++++++++++++++++++++---------
1 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index e91a354..db2718a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -515,15 +515,26 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
int size, i;
struct ftrace_event_call *call = &tu->call;

- size = SIZEOF_TRACE_ENTRY(1) + tu->size;
+ if (is_ret_probe(tu))
+ size = SIZEOF_TRACE_ENTRY(2);
+ else
+ size = SIZEOF_TRACE_ENTRY(1);
+
event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
- size, 0, 0);
+ size + tu->size, 0, 0);
if (!event)
return;

entry = ring_buffer_event_data(event);
- entry->vaddr[0] = instruction_pointer(regs);
- data = DATAOF_TRACE_ENTRY(entry, 1);
+ if (is_ret_probe(tu)) {
+ entry->vaddr[0] = func;
+ entry->vaddr[1] = instruction_pointer(regs);
+ data = DATAOF_TRACE_ENTRY(entry, 2);
+ } else {
+ entry->vaddr[0] = instruction_pointer(regs);
+ data = DATAOF_TRACE_ENTRY(entry, 1);
+ }
+
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

@@ -534,7 +545,8 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
/* uprobe handler */
static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
{
- uprobe_trace_print(tu, 0, regs);
+ if (!is_ret_probe(tu))
+ uprobe_trace_print(tu, 0, regs);
return 0;
}

@@ -783,7 +795,11 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
void *data;
int size, rctx, i;

- size = SIZEOF_TRACE_ENTRY(1);
+ if (is_ret_probe(tu))
+ size = SIZEOF_TRACE_ENTRY(2);
+ else
+ size = SIZEOF_TRACE_ENTRY(1);
+
size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
return;
@@ -794,8 +810,15 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
goto out;

ip = instruction_pointer(regs);
- entry->vaddr[0] = ip;
- data = DATAOF_TRACE_ENTRY(entry, 1);
+ if (is_ret_probe(tu)) {
+ entry->vaddr[0] = func;
+ entry->vaddr[1] = ip;
+ data = DATAOF_TRACE_ENTRY(entry, 2);
+ } else {
+ entry->vaddr[0] = ip;
+ data = DATAOF_TRACE_ENTRY(entry, 1);
+ }
+
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

@@ -811,7 +834,8 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
return UPROBE_HANDLER_REMOVE;

- uprobe_perf_print(tu, 0, regs);
+ if (!is_ret_probe(tu))
+ uprobe_perf_print(tu, 0, regs);
return 0;
}

--
1.5.5.1

2013-04-01 16:12:05

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 5/6] uprobes/tracing: Make seq_printf() code uretprobe-friendly

Change probes_seq_show() and print_uprobe_event() to check
is_ret_probe() and print the correct data.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f75e52d..1b3622a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -434,9 +434,10 @@ static void probes_seq_stop(struct seq_file *m, void *v)
static int probes_seq_show(struct seq_file *m, void *v)
{
struct trace_uprobe *tu = v;
+ char c = is_ret_probe(tu) ? 'r' : 'p';
int i;

- seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
+ seq_printf(m, "%c:%s/%s", c, tu->call.class->system, tu->call.name);
seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);

for (i = 0; i < tu->nr_args; i++)
@@ -569,10 +570,18 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
entry = (struct uprobe_trace_entry_head *)iter->ent;
tu = container_of(event, struct trace_uprobe, call.event);

- if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
- goto partial;
+ if (is_ret_probe(tu)) {
+ if (!trace_seq_printf(s, "%s: (0x%lx <- 0x%lx)", tu->call.name,
+ entry->vaddr[1], entry->vaddr[0]))
+ goto partial;
+ data = DATAOF_TRACE_ENTRY(entry, 2);
+ } else {
+ if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name,
+ entry->vaddr[0]))
+ goto partial;
+ data = DATAOF_TRACE_ENTRY(entry, 1);
+ }

- data = DATAOF_TRACE_ENTRY(entry, 1);
for (i = 0; i < tu->nr_args; i++) {
if (!tu->args[i].type->print(s, tu->args[i].name,
data + tu->args[i].offset, entry))
--
1.5.5.1

2013-04-01 16:12:31

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 4/6] uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly

Change uprobe_event_define_fields(), and __set_print_fmt() to check
is_ret_probe() and use the appropriate format/fields.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 19 +++++++++++++++----
1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index db2718a..f75e52d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -631,8 +631,14 @@ static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
struct uprobe_trace_entry_head field;
struct trace_uprobe *tu = event_call->data;

- DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
- size = SIZEOF_TRACE_ENTRY(1);
+ if (is_ret_probe(tu)) {
+ DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_FUNC, 0);
+ DEFINE_FIELD(unsigned long, vaddr[1], FIELD_STRING_RETIP, 0);
+ size = SIZEOF_TRACE_ENTRY(2);
+ } else {
+ DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
+ size = SIZEOF_TRACE_ENTRY(1);
+ }
/* Set argument names as fields */
for (i = 0; i < tu->nr_args; i++) {
ret = trace_define_field(event_call, tu->args[i].type->fmttype,
@@ -655,8 +661,13 @@ static int __set_print_fmt(struct trace_uprobe *tu, char *buf, int len)
int i;
int pos = 0;

- fmt = "(%lx)";
- arg = "REC->" FIELD_STRING_IP;
+ if (is_ret_probe(tu)) {
+ fmt = "(%lx <- %lx)";
+ arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
+ } else {
+ fmt = "(%lx)";
+ arg = "REC->" FIELD_STRING_IP;
+ }

/* When len=0, we just calculate the needed length */

--
1.5.5.1

2013-04-01 16:13:00

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/6] uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()

Create the new functions we need to support uretprobes, and change
alloc_trace_uprobe() to initialize consumer.ret_handler if the new
"is_ret" argument is true. Curently this argument is always false,
so the new code is never called and is_ret_probe(tu) is false too.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 42 ++++++++++++++++++++++++++++++++++++++++--
1 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2ea9961..e91a354 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -75,6 +75,8 @@ static DEFINE_MUTEX(uprobe_lock);
static LIST_HEAD(uprobe_list);

static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs);
+static int uretprobe_dispatcher(struct uprobe_consumer *con,
+ unsigned long func, struct pt_regs *regs);

static inline void init_trace_uprobe_filter(struct trace_uprobe_filter *filter)
{
@@ -88,11 +90,16 @@ static inline bool uprobe_filter_is_empty(struct trace_uprobe_filter *filter)
return !filter->nr_systemwide && list_empty(&filter->perf_events);
}

+static inline bool is_ret_probe(struct trace_uprobe *tu)
+{
+ return tu->consumer.ret_handler != NULL;
+}
+
/*
* Allocate new trace_uprobe and initialize it (including uprobes).
*/
static struct trace_uprobe *
-alloc_trace_uprobe(const char *group, const char *event, int nargs)
+alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
{
struct trace_uprobe *tu;

@@ -117,6 +124,8 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs)

INIT_LIST_HEAD(&tu->list);
tu->consumer.handler = uprobe_dispatcher;
+ if (is_ret)
+ tu->consumer.ret_handler = uretprobe_dispatcher;
init_trace_uprobe_filter(&tu->filter);
return tu;

@@ -314,7 +323,7 @@ static int create_trace_uprobe(int argc, char **argv)
kfree(tail);
}

- tu = alloc_trace_uprobe(group, event, argc);
+ tu = alloc_trace_uprobe(group, event, argc, false);
if (IS_ERR(tu)) {
pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu));
ret = PTR_ERR(tu);
@@ -529,6 +538,12 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
return 0;
}

+static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
+ struct pt_regs *regs)
+{
+ uprobe_trace_print(tu, func, regs);
+}
+
/* Event entry printers */
static enum print_line_t
print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
@@ -799,6 +814,12 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
uprobe_perf_print(tu, 0, regs);
return 0;
}
+
+static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func,
+ struct pt_regs *regs)
+{
+ uprobe_perf_print(tu, func, regs);
+}
#endif /* CONFIG_PERF_EVENTS */

static
@@ -853,6 +874,23 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
return ret;
}

+static int uretprobe_dispatcher(struct uprobe_consumer *con,
+ unsigned long func, struct pt_regs *regs)
+{
+ struct trace_uprobe *tu;
+
+ tu = container_of(con, struct trace_uprobe, consumer);
+
+ if (tu->flags & TP_FLAG_TRACE)
+ uretprobe_trace_func(tu, func, regs);
+
+#ifdef CONFIG_PERF_EVENTS
+ if (tu->flags & TP_FLAG_PROFILE)
+ uretprobe_perf_func(tu, func, regs);
+#endif
+ return 0;
+}
+
static struct trace_event_functions uprobe_funcs = {
.trace = print_uprobe_event
};
--
1.5.5.1

2013-04-01 16:13:09

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 6/6] uprobes/tracing: Change create_trace_uprobe() to support uretprobes

Finally change create_trace_uprobe() to check if argv[0][0] == 'r'
and pass the correct "is_ret" to alloc_trace_uprobe().

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 1b3622a..2773d2a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -200,7 +200,7 @@ end:

/*
* Argument syntax:
- * - Add uprobe: p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS]
+ * - Add uprobe: p|r[:[GRP/]EVENT] PATH:SYMBOL [FETCHARGS]
*
* - Remove uprobe: -:[GRP/]EVENT
*/
@@ -212,20 +212,23 @@ static int create_trace_uprobe(int argc, char **argv)
char buf[MAX_EVENT_NAME_LEN];
struct path path;
unsigned long offset;
- bool is_delete;
+ bool is_delete, is_return;
int i, ret;

inode = NULL;
ret = 0;
is_delete = false;
+ is_return = false;
event = NULL;
group = NULL;

/* argc must be >= 1 */
if (argv[0][0] == '-')
is_delete = true;
+ else if (argv[0][0] == 'r')
+ is_return = true;
else if (argv[0][0] != 'p') {
- pr_info("Probe definition must be started with 'p' or '-'.\n");
+ pr_info("Probe definition must be started with 'p', 'r' or '-'.\n");
return -EINVAL;
}

@@ -323,7 +326,7 @@ static int create_trace_uprobe(int argc, char **argv)
kfree(tail);
}

- tu = alloc_trace_uprobe(group, event, argc, false);
+ tu = alloc_trace_uprobe(group, event, argc, is_return);
if (IS_ERR(tu)) {
pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu));
ret = PTR_ERR(tu);
--
1.5.5.1

2013-04-02 08:58:39

by Anton Arapov

[permalink] [raw]
Subject: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

On Fri, Mar 29, 2013 at 07:15:45PM +0100, Oleg Nesterov wrote:
> uprobe_trace_func() is never called with irqs or preemption
> disabled, no need to ask preempt_count() or local_save_flags().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 10 +++-------
> 1 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8e00901..43d258d 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct ring_buffer_event *event;
> struct ring_buffer *buffer;
> u8 *data;
> - int size, i, pc;
> - unsigned long irq_flags;
> + int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - local_save_flags(irq_flags);
> - pc = preempt_count();
> -
> size = sizeof(*entry) + tu->size;
>
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> - size, irq_flags, pc);
> + size, 0, 0);
> if (!event)
> return 0;
>
> @@ -513,7 +509,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> if (!filter_current_check_discard(buffer, call, entry, event))
> - trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> + trace_buffer_unlock_commit(buffer, event, 0, 0);
>
> return 0;
> }
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>

2013-04-02 08:59:07

by Anton Arapov

[permalink] [raw]
Subject: Re: [PATCH 2/4] uprobes/tracing: Kill the pointless seq_print_ip_sym() call

On Fri, Mar 29, 2013 at 07:15:43PM +0100, Oleg Nesterov wrote:
> seq_print_ip_sym(ip) in print_uprobe_event() is pointless,
> kallsyms_lookup(ip) can not resolve a user-space address.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 8 +-------
> 1 files changed, 1 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index af5b773..8e00901 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -531,13 +531,7 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
> field = (struct uprobe_trace_entry_head *)iter->ent;
> tu = container_of(event, struct trace_uprobe, call.event);
>
> - if (!trace_seq_printf(s, "%s: (", tu->call.name))
> - goto partial;
> -
> - if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
> - goto partial;
> -
> - if (!trace_seq_puts(s, ")"))
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
> goto partial;
>
> data = (u8 *)&field[1];
> --
> 1.5.5.1

Acked-by: Anton Arapov <[email protected]>

2013-04-02 08:59:05

by Anton Arapov

[permalink] [raw]
Subject: Re: [PATCH 1/4] uprobes/tracing: Kill the pointless task_pt_regs() calls

On Fri, Mar 29, 2013 at 07:15:40PM +0100, Oleg Nesterov wrote:
> uprobe_trace_func() and uprobe_perf_func() do not need task_pt_regs(),
> we already have "struct pt_regs *regs".
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8dad2a9..af5b773 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -507,7 +507,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> return 0;
>
> entry = ring_buffer_event_data(event);
> - entry->ip = instruction_pointer(task_pt_regs(current));
> + entry->ip = instruction_pointer(regs);
> data = (u8 *)&entry[1];
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
> @@ -777,7 +777,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> if (!entry)
> goto out;
>
> - entry->ip = instruction_pointer(task_pt_regs(current));
> + entry->ip = instruction_pointer(regs);
> data = (u8 *)&entry[1];
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>

2013-04-02 08:59:42

by Anton Arapov

[permalink] [raw]
Subject: Re: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

On Fri, Mar 29, 2013 at 07:15:48PM +0100, Oleg Nesterov wrote:
> struct uprobe_trace_entry_head has a single member for reporting,
> "unsigned long ip". If we want to support uretprobes we need to
> create another struct which has "func" and "ret_ip" and duplicate
> a lot of functions, like trace_kprobe.c does.
>
> To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
> and add couple of trivial helpers to calculate sizeof/data. This
> uglifies the code a bit, but this allows us to avoid a lot more
> complications later, when we add the support for ret-probes.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace.h | 5 ---
> kernel/trace/trace_uprobe.c | 61 ++++++++++++++++++++++++------------------
> 2 files changed, 35 insertions(+), 31 deletions(-)
>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 57d7e53..6ca57cf 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -103,11 +103,6 @@ struct kretprobe_trace_entry_head {
> unsigned long ret_ip;
> };
>
> -struct uprobe_trace_entry_head {
> - struct trace_entry ent;
> - unsigned long ip;
> -};
> -
> /*
> * trace_flag_type is an enumeration that holds different
> * states when a trace occurs. These are:
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 43d258d..92a4b08 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -28,6 +28,17 @@
>
> #define UPROBE_EVENT_SYSTEM "uprobes"
>
> +struct uprobe_trace_entry_head {
> + struct trace_entry ent;
> + unsigned long vaddr[];
> +};
> +
> +#define SIZEOF_TRACE_ENTRY(nr) \
> + (sizeof(struct uprobe_trace_entry_head) + sizeof(unsigned long) * (nr))
> +
> +#define DATAOF_TRACE_ENTRY(entry, nr) \
> + ((void*)&(entry)->vaddr[nr])
> +
> struct trace_uprobe_filter {
> rwlock_t rwlock;
> int nr_systemwide;
> @@ -491,20 +502,19 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct uprobe_trace_entry_head *entry;
> struct ring_buffer_event *event;
> struct ring_buffer *buffer;
> - u8 *data;
> + void *data;
> int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - size = sizeof(*entry) + tu->size;
> -
> + size = SIZEOF_TRACE_ENTRY(1) + tu->size;
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> size, 0, 0);
> if (!event)
> return 0;
>
> entry = ring_buffer_event_data(event);
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + entry->vaddr[0] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -518,22 +528,22 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> static enum print_line_t
> print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
> {
> - struct uprobe_trace_entry_head *field;
> + struct uprobe_trace_entry_head *entry;
> struct trace_seq *s = &iter->seq;
> struct trace_uprobe *tu;
> u8 *data;
> int i;
>
> - field = (struct uprobe_trace_entry_head *)iter->ent;
> + entry = (struct uprobe_trace_entry_head *)iter->ent;
> tu = container_of(event, struct trace_uprobe, call.event);
>
> - if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
> goto partial;
>
> - data = (u8 *)&field[1];
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++) {
> if (!tu->args[i].type->print(s, tu->args[i].name,
> - data + tu->args[i].offset, field))
> + data + tu->args[i].offset, entry))
> goto partial;
> }
>
> @@ -585,16 +595,17 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>
> static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
> {
> - int ret, i;
> + int ret, i, size;
> struct uprobe_trace_entry_head field;
> - struct trace_uprobe *tu = (struct trace_uprobe *)event_call->data;
> + struct trace_uprobe *tu = event_call->data;
>
> - DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
> + DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
> + size = SIZEOF_TRACE_ENTRY(1);
> /* Set argument names as fields */
> for (i = 0; i < tu->nr_args; i++) {
> ret = trace_define_field(event_call, tu->args[i].type->fmttype,
> tu->args[i].name,
> - sizeof(field) + tu->args[i].offset,
> + size + tu->args[i].offset,
> tu->args[i].type->size,
> tu->args[i].type->is_signed,
> FILTER_OTHER);
> @@ -748,33 +759,31 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct ftrace_event_call *call = &tu->call;
> struct uprobe_trace_entry_head *entry;
> struct hlist_head *head;
> - u8 *data;
> - int size, __size, i;
> - int rctx;
> + unsigned long ip;
> + void *data;
> + int size, rctx, i;
>
> if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> return UPROBE_HANDLER_REMOVE;
>
> - __size = sizeof(*entry) + tu->size;
> - size = ALIGN(__size + sizeof(u32), sizeof(u64));
> - size -= sizeof(u32);
> + size = SIZEOF_TRACE_ENTRY(1);
> + size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> return 0;
>
> preempt_disable();
> -
> entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
> if (!entry)
> goto out;
>
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + ip = instruction_pointer(regs);
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> head = this_cpu_ptr(call->perf_events);
> - perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head, NULL);
> -
> + perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
> out:
> preempt_enable();
> return 0;
> @@ -784,7 +793,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> static
> int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type, void *data)
> {
> - struct trace_uprobe *tu = (struct trace_uprobe *)event->data;
> + struct trace_uprobe *tu = event->data;
>
> switch (type) {
> case TRACE_REG_REGISTER:
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>

2013-04-02 13:25:30

by Anton Arapov

[permalink] [raw]
Subject: Re: [PATCH 0/6] uprobes/tracing: uretprobes

On Mon, Apr 01, 2013 at 06:08:27PM +0200, Oleg Nesterov wrote:
> On 03/29, Oleg Nesterov wrote:
> >
> > uretprobes code is almost ready, we need to teach trace_uprobe.c
> > to support them.
> >
> > This looks simple, but there is a nasty complication. We do not
> > want to copy-and-paste the code like trace_kprobe.c does. Just look
> > at kprobe_event_define_fields() and kretprobe_event_define_fields().
> > They are non-trivial but almost identical. And there are a lot more
> > examples.
> >
> > So I'd like to send 4/4 for review before I'll do other changes.
> > The patch itself doesn't make sense and complicates the source code a
> > bit. But note how easy we can change, say, uprobe_event_define_fields(),
> >
> > - DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
> > - size = SIZEOF_TRACE_ENTRY(1);
> > + if (!trace_probe_is_return(tu)) {
> > + DEFINE_FIELD(vaddr[0], FIELD_STRING_IP);
> > + size = SIZEOF_TRACE_ENTRY(1);
> > + } else {
> > + DEFINE_FIELD(vaddr[0], FIELD_STRING_FUNC);
> > + DEFINE_FIELD(vaddr[1], FIELD_STRING_RETIP);
> > + size = SIZEOF_TRACE_ENTRY(2);
> > + }
> >
> > without copy-and-paste. The same simple change is possible for other
> > helpers playing with uprobe_trace_entry_head.
>
> And the rest of the necessary changes to support uretprobes.
>
> To remind, this is on top of Anton's uretprobes implementation which is
> not finished yet. This series can't be even compiled, but I think it can
> be already reviewed. All it needs to compile is the new
> uprobe_consumer->ret_handler().
>
> Oleg.
>
> kernel/trace/trace_uprobe.c | 152 +++++++++++++++++++++++++++++++++++--------
> 1 files changed, 124 insertions(+), 28 deletions(-)
>

I've played with this patch-set and it works.

Tested-by: Anton Arapov <[email protected]>

2013-04-04 14:30:09

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 1/4] uprobes/tracing: Kill the pointless task_pt_regs() calls

* Oleg Nesterov <[email protected]> [2013-03-29 19:15:40]:

> uprobe_trace_func() and uprobe_perf_func() do not need task_pt_regs(),
> we already have "struct pt_regs *regs".
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

Adding Masami in the cc.

> ---
> kernel/trace/trace_uprobe.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8dad2a9..af5b773 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -507,7 +507,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> return 0;
>
> entry = ring_buffer_event_data(event);
> - entry->ip = instruction_pointer(task_pt_regs(current));
> + entry->ip = instruction_pointer(regs);
> data = (u8 *)&entry[1];
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
> @@ -777,7 +777,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> if (!entry)
> goto out;
>
> - entry->ip = instruction_pointer(task_pt_regs(current));
> + entry->ip = instruction_pointer(regs);
> data = (u8 *)&entry[1];
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-04 14:30:38

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 2/4] uprobes/tracing: Kill the pointless seq_print_ip_sym() call

* Oleg Nesterov <[email protected]> [2013-03-29 19:15:43]:

> seq_print_ip_sym(ip) in print_uprobe_event() is pointless,
> kallsyms_lookup(ip) can not resolve a user-space address.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 8 +-------
> 1 files changed, 1 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index af5b773..8e00901 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -531,13 +531,7 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
> field = (struct uprobe_trace_entry_head *)iter->ent;
> tu = container_of(event, struct trace_uprobe, call.event);
>
> - if (!trace_seq_printf(s, "%s: (", tu->call.name))
> - goto partial;
> -
> - if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
> - goto partial;
> -
> - if (!trace_seq_puts(s, ")"))
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
> goto partial;
>
> data = (u8 *)&field[1];
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-04 14:30:59

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

* Oleg Nesterov <[email protected]> [2013-03-29 19:15:45]:

> uprobe_trace_func() is never called with irqs or preemption
> disabled, no need to ask preempt_count() or local_save_flags().
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

Adding Masami in the Cc.

> ---
> kernel/trace/trace_uprobe.c | 10 +++-------
> 1 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8e00901..43d258d 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct ring_buffer_event *event;
> struct ring_buffer *buffer;
> u8 *data;
> - int size, i, pc;
> - unsigned long irq_flags;
> + int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - local_save_flags(irq_flags);
> - pc = preempt_count();
> -
> size = sizeof(*entry) + tu->size;
>
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> - size, irq_flags, pc);
> + size, 0, 0);
> if (!event)
> return 0;
>
> @@ -513,7 +509,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> if (!filter_current_check_discard(buffer, call, entry, event))
> - trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> + trace_buffer_unlock_commit(buffer, event, 0, 0);
>
> return 0;
> }
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-04 14:31:26

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

* Oleg Nesterov <[email protected]> [2013-03-29 19:15:48]:

> struct uprobe_trace_entry_head has a single member for reporting,
> "unsigned long ip". If we want to support uretprobes we need to
> create another struct which has "func" and "ret_ip" and duplicate
> a lot of functions, like trace_kprobe.c does.
>
> To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
> and add couple of trivial helpers to calculate sizeof/data. This
> uglifies the code a bit, but this allows us to avoid a lot more
> complications later, when we add the support for ret-probes.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

Also copying Masami.

> ---
> kernel/trace/trace.h | 5 ---
> kernel/trace/trace_uprobe.c | 61 ++++++++++++++++++++++++------------------
> 2 files changed, 35 insertions(+), 31 deletions(-)
>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 57d7e53..6ca57cf 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -103,11 +103,6 @@ struct kretprobe_trace_entry_head {
> unsigned long ret_ip;
> };
>
> -struct uprobe_trace_entry_head {
> - struct trace_entry ent;
> - unsigned long ip;
> -};
> -
> /*
> * trace_flag_type is an enumeration that holds different
> * states when a trace occurs. These are:
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 43d258d..92a4b08 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -28,6 +28,17 @@
>
> #define UPROBE_EVENT_SYSTEM "uprobes"
>
> +struct uprobe_trace_entry_head {
> + struct trace_entry ent;
> + unsigned long vaddr[];
> +};
> +
> +#define SIZEOF_TRACE_ENTRY(nr) \
> + (sizeof(struct uprobe_trace_entry_head) + sizeof(unsigned long) * (nr))
> +
> +#define DATAOF_TRACE_ENTRY(entry, nr) \
> + ((void*)&(entry)->vaddr[nr])
> +
> struct trace_uprobe_filter {
> rwlock_t rwlock;
> int nr_systemwide;
> @@ -491,20 +502,19 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct uprobe_trace_entry_head *entry;
> struct ring_buffer_event *event;
> struct ring_buffer *buffer;
> - u8 *data;
> + void *data;
> int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - size = sizeof(*entry) + tu->size;
> -
> + size = SIZEOF_TRACE_ENTRY(1) + tu->size;
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> size, 0, 0);
> if (!event)
> return 0;
>
> entry = ring_buffer_event_data(event);
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + entry->vaddr[0] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -518,22 +528,22 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> static enum print_line_t
> print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
> {
> - struct uprobe_trace_entry_head *field;
> + struct uprobe_trace_entry_head *entry;
> struct trace_seq *s = &iter->seq;
> struct trace_uprobe *tu;
> u8 *data;
> int i;
>
> - field = (struct uprobe_trace_entry_head *)iter->ent;
> + entry = (struct uprobe_trace_entry_head *)iter->ent;
> tu = container_of(event, struct trace_uprobe, call.event);
>
> - if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
> goto partial;
>
> - data = (u8 *)&field[1];
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++) {
> if (!tu->args[i].type->print(s, tu->args[i].name,
> - data + tu->args[i].offset, field))
> + data + tu->args[i].offset, entry))
> goto partial;
> }
>
> @@ -585,16 +595,17 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>
> static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
> {
> - int ret, i;
> + int ret, i, size;
> struct uprobe_trace_entry_head field;
> - struct trace_uprobe *tu = (struct trace_uprobe *)event_call->data;
> + struct trace_uprobe *tu = event_call->data;
>
> - DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
> + DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
> + size = SIZEOF_TRACE_ENTRY(1);
> /* Set argument names as fields */
> for (i = 0; i < tu->nr_args; i++) {
> ret = trace_define_field(event_call, tu->args[i].type->fmttype,
> tu->args[i].name,
> - sizeof(field) + tu->args[i].offset,
> + size + tu->args[i].offset,
> tu->args[i].type->size,
> tu->args[i].type->is_signed,
> FILTER_OTHER);
> @@ -748,33 +759,31 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct ftrace_event_call *call = &tu->call;
> struct uprobe_trace_entry_head *entry;
> struct hlist_head *head;
> - u8 *data;
> - int size, __size, i;
> - int rctx;
> + unsigned long ip;
> + void *data;
> + int size, rctx, i;
>
> if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> return UPROBE_HANDLER_REMOVE;
>
> - __size = sizeof(*entry) + tu->size;
> - size = ALIGN(__size + sizeof(u32), sizeof(u64));
> - size -= sizeof(u32);
> + size = SIZEOF_TRACE_ENTRY(1);
> + size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> return 0;
>
> preempt_disable();
> -
> entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
> if (!entry)
> goto out;
>
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + ip = instruction_pointer(regs);
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> head = this_cpu_ptr(call->perf_events);
> - perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head, NULL);
> -
> + perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
> out:
> preempt_enable();
> return 0;
> @@ -784,7 +793,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> static
> int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type, void *data)
> {
> - struct trace_uprobe *tu = (struct trace_uprobe *)event->data;
> + struct trace_uprobe *tu = event->data;
>
> switch (type) {
> case TRACE_REG_REGISTER:
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

Subject: Re: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

(2013/04/04 23:25), Srikar Dronamraju wrote:
> * Oleg Nesterov <[email protected]> [2013-03-29 19:15:45]:
>
>> uprobe_trace_func() is never called with irqs or preemption
>> disabled, no need to ask preempt_count() or local_save_flags().
>>
>> Signed-off-by: Oleg Nesterov <[email protected]>
>
> Acked-by: Srikar Dronamraju <[email protected]>
>
> Adding Masami in the Cc.

Thanks :), since the running context of uprobe handler is
different from kprobe one, this change is only needed for uprobe.

Acked-by: Masami Hiramatsu <[email protected]>

Thank you,

>
>> ---
>> kernel/trace/trace_uprobe.c | 10 +++-------
>> 1 files changed, 3 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
>> index 8e00901..43d258d 100644
>> --- a/kernel/trace/trace_uprobe.c
>> +++ b/kernel/trace/trace_uprobe.c
>> @@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
>> struct ring_buffer_event *event;
>> struct ring_buffer *buffer;
>> u8 *data;
>> - int size, i, pc;
>> - unsigned long irq_flags;
>> + int size, i;
>> struct ftrace_event_call *call = &tu->call;
>>
>> - local_save_flags(irq_flags);
>> - pc = preempt_count();
>> -
>> size = sizeof(*entry) + tu->size;
>>
>> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
>> - size, irq_flags, pc);
>> + size, 0, 0);
>> if (!event)
>> return 0;
>>
>> @@ -513,7 +509,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
>> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>>
>> if (!filter_current_check_discard(buffer, call, entry, event))
>> - trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
>> + trace_buffer_unlock_commit(buffer, event, 0, 0);
>>
>> return 0;
>> }
>> --
>> 1.5.5.1
>>
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-04-05 15:05:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

On 04/05, Masami Hiramatsu wrote:
>
> Acked-by: Masami Hiramatsu <[email protected]>

Thanks!

Masami, perhaps you can also answer the question I asked in 0/4
marc.info/?l=linux-kernel&m=136458107403835 ?

Off-topic question... Why uprobe_perf_func() passes "addr = ip" to
perf_trace_buf_submit() ? This just sets perf_sample_data->addr for
PERF_SAMPLE_ADDR, do we really need this? and we already have
perf_sample_data->ip initialized by perf.

kprobe_perf_func() and kretprobe_perf_func() do the same.

Once again, I am just curious and this is completely offtopic.

Oleg.

2013-04-07 10:32:19

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 3/6] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

* Oleg Nesterov <[email protected]> [2013-04-01 18:08:51]:

> Change uprobe_trace_print() and uprobe_perf_print() to check
> is_ret_probe() and fill ring_buffer_event accordingly.
>
> Also change uprobe_trace_func() and uprobe_perf_func() to not
> _print() if is_ret_probe() is true. Note that we keep ->handler()
> nontrivial even for uretprobe, we need this for filtering and for
> other potential extensions.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 42 +++++++++++++++++++++++++++++++++---------
> 1 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index e91a354..db2718a 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -515,15 +515,26 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - size = SIZEOF_TRACE_ENTRY(1) + tu->size;
> + if (is_ret_probe(tu))

One nit:
Here and couple of places below .. we could check for func instead of
is_ret_probe() right?
Or is there an advantage of checking is_ret_probe() over func?

> + size = SIZEOF_TRACE_ENTRY(2);
> + else
> + size = SIZEOF_TRACE_ENTRY(1);
> +
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> - size, 0, 0);
> + size + tu->size, 0, 0);
> if (!event)
> return;
>
> entry = ring_buffer_event_data(event);
> - entry->vaddr[0] = instruction_pointer(regs);
> - data = DATAOF_TRACE_ENTRY(entry, 1);
> + if (is_ret_probe(tu)) {
> + entry->vaddr[0] = func;
> + entry->vaddr[1] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 2);
> + } else {
> + entry->vaddr[0] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> + }
> +
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -534,7 +545,8 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> /* uprobe handler */
> static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> {
> - uprobe_trace_print(tu, 0, regs);
> + if (!is_ret_probe(tu))
> + uprobe_trace_print(tu, 0, regs);

Should this hunk be in the previous patch?

Also something for the future:
Most times a user uses a return probe, the user probably wants to probe
the function entry too. So should we extend the abi from p+r to
p+r+..<something else> to mean it traces both function entry and return.
Esp given that uretprobe has been elegantly been designed to make this a
possibility.


> return 0;
> }
>
> @@ -783,7 +795,11 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> void *data;
> int size, rctx, i;
>
> - size = SIZEOF_TRACE_ENTRY(1);
> + if (is_ret_probe(tu))
> + size = SIZEOF_TRACE_ENTRY(2);
> + else
> + size = SIZEOF_TRACE_ENTRY(1);
> +
> size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> return;
> @@ -794,8 +810,15 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> goto out;
>
> ip = instruction_pointer(regs);
> - entry->vaddr[0] = ip;
> - data = DATAOF_TRACE_ENTRY(entry, 1);
> + if (is_ret_probe(tu)) {
> + entry->vaddr[0] = func;
> + entry->vaddr[1] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 2);
> + } else {
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> + }
> +
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -811,7 +834,8 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> return UPROBE_HANDLER_REMOVE;
>
> - uprobe_perf_print(tu, 0, regs);
> + if (!is_ret_probe(tu))
> + uprobe_perf_print(tu, 0, regs);
> return 0;
> }
>
> --
> 1.5.5.1
>

2013-04-07 14:04:11

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 1/6] uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers

* Oleg Nesterov <[email protected]> [2013-04-01 18:08:45]:

> Extract the output code from uprobe_trace_func() and uprobe_perf_func()
> into the new helpers, they will be used by ->ret_handler() too. We also
> add the unused "unsigned long func" argument in advance, to simplify the
> next changes.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 29 ++++++++++++++++++++---------
> 1 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 92a4b08..2ea9961 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -496,8 +496,8 @@ static const struct file_operations uprobe_profile_ops = {
> .release = seq_release,
> };
>
> -/* uprobe handler */
> -static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> +static void uprobe_trace_print(struct trace_uprobe *tu,
> + unsigned long func, struct pt_regs *regs)
> {
> struct uprobe_trace_entry_head *entry;
> struct ring_buffer_event *event;
> @@ -510,7 +510,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> size, 0, 0);
> if (!event)
> - return 0;
> + return;
>
> entry = ring_buffer_event_data(event);
> entry->vaddr[0] = instruction_pointer(regs);
> @@ -520,7 +520,12 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
>
> if (!filter_current_check_discard(buffer, call, entry, event))
> trace_buffer_unlock_commit(buffer, event, 0, 0);
> +}
>
> +/* uprobe handler */
> +static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> +{
> + uprobe_trace_print(tu, 0, regs);
> return 0;
> }
>
> @@ -753,8 +758,8 @@ static bool uprobe_perf_filter(struct uprobe_consumer *uc,
> return ret;
> }
>
> -/* uprobe profile handler */
> -static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> +static void uprobe_perf_print(struct trace_uprobe *tu,
> + unsigned long func, struct pt_regs *regs)
> {
> struct ftrace_event_call *call = &tu->call;
> struct uprobe_trace_entry_head *entry;
> @@ -763,13 +768,10 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> void *data;
> int size, rctx, i;
>
> - if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> - return UPROBE_HANDLER_REMOVE;
> -
> size = SIZEOF_TRACE_ENTRY(1);
> size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> - return 0;
> + return;
>
> preempt_disable();
> entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
> @@ -786,6 +788,15 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
> out:
> preempt_enable();
> +}
> +
> +/* uprobe profile handler */
> +static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> +{
> + if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> + return UPROBE_HANDLER_REMOVE;
> +
> + uprobe_perf_print(tu, 0, regs);
> return 0;
> }
> #endif /* CONFIG_PERF_EVENTS */
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-07 14:18:36

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 2/6] uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()

* Oleg Nesterov <[email protected]> [2013-04-01 18:08:48]:

> Create the new functions we need to support uretprobes, and change
> alloc_trace_uprobe() to initialize consumer.ret_handler if the new
> "is_ret" argument is true. Curently this argument is always false,
> so the new code is never called and is_ret_probe(tu) is false too.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> 1 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 2ea9961..e91a354 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -75,6 +75,8 @@ static DEFINE_MUTEX(uprobe_lock);
> static LIST_HEAD(uprobe_list);
>
> static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs);
> +static int uretprobe_dispatcher(struct uprobe_consumer *con,
> + unsigned long func, struct pt_regs *regs);
>
> static inline void init_trace_uprobe_filter(struct trace_uprobe_filter *filter)
> {
> @@ -88,11 +90,16 @@ static inline bool uprobe_filter_is_empty(struct trace_uprobe_filter *filter)
> return !filter->nr_systemwide && list_empty(&filter->perf_events);
> }
>
> +static inline bool is_ret_probe(struct trace_uprobe *tu)
> +{
> + return tu->consumer.ret_handler != NULL;
> +}
> +
> /*
> * Allocate new trace_uprobe and initialize it (including uprobes).
> */
> static struct trace_uprobe *
> -alloc_trace_uprobe(const char *group, const char *event, int nargs)
> +alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
> {
> struct trace_uprobe *tu;
>
> @@ -117,6 +124,8 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs)
>
> INIT_LIST_HEAD(&tu->list);
> tu->consumer.handler = uprobe_dispatcher;
> + if (is_ret)
> + tu->consumer.ret_handler = uretprobe_dispatcher;
> init_trace_uprobe_filter(&tu->filter);
> return tu;
>
> @@ -314,7 +323,7 @@ static int create_trace_uprobe(int argc, char **argv)
> kfree(tail);
> }
>
> - tu = alloc_trace_uprobe(group, event, argc);
> + tu = alloc_trace_uprobe(group, event, argc, false);
> if (IS_ERR(tu)) {
> pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu));
> ret = PTR_ERR(tu);
> @@ -529,6 +538,12 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> return 0;
> }
>
> +static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
> + struct pt_regs *regs)
> +{
> + uprobe_trace_print(tu, func, regs);
> +}
> +
> /* Event entry printers */
> static enum print_line_t
> print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
> @@ -799,6 +814,12 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> uprobe_perf_print(tu, 0, regs);
> return 0;
> }
> +
> +static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func,
> + struct pt_regs *regs)
> +{
> + uprobe_perf_print(tu, func, regs);
> +}
> #endif /* CONFIG_PERF_EVENTS */
>
> static
> @@ -853,6 +874,23 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> return ret;
> }
>
> +static int uretprobe_dispatcher(struct uprobe_consumer *con,
> + unsigned long func, struct pt_regs *regs)
> +{
> + struct trace_uprobe *tu;
> +
> + tu = container_of(con, struct trace_uprobe, consumer);
> +
> + if (tu->flags & TP_FLAG_TRACE)
> + uretprobe_trace_func(tu, func, regs);
> +
> +#ifdef CONFIG_PERF_EVENTS
> + if (tu->flags & TP_FLAG_PROFILE)
> + uretprobe_perf_func(tu, func, regs);
> +#endif
> + return 0;
> +}
> +
> static struct trace_event_functions uprobe_funcs = {
> .trace = print_uprobe_event
> };
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-07 14:20:20

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 4/6] uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly

* Oleg Nesterov <[email protected]> [2013-04-01 18:08:54]:

> Change uprobe_event_define_fields(), and __set_print_fmt() to check
> is_ret_probe() and use the appropriate format/fields.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 19 +++++++++++++++----
> 1 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index db2718a..f75e52d 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -631,8 +631,14 @@ static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
> struct uprobe_trace_entry_head field;
> struct trace_uprobe *tu = event_call->data;
>
> - DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
> - size = SIZEOF_TRACE_ENTRY(1);
> + if (is_ret_probe(tu)) {
> + DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_FUNC, 0);
> + DEFINE_FIELD(unsigned long, vaddr[1], FIELD_STRING_RETIP, 0);
> + size = SIZEOF_TRACE_ENTRY(2);
> + } else {
> + DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
> + size = SIZEOF_TRACE_ENTRY(1);
> + }
> /* Set argument names as fields */
> for (i = 0; i < tu->nr_args; i++) {
> ret = trace_define_field(event_call, tu->args[i].type->fmttype,
> @@ -655,8 +661,13 @@ static int __set_print_fmt(struct trace_uprobe *tu, char *buf, int len)
> int i;
> int pos = 0;
>
> - fmt = "(%lx)";
> - arg = "REC->" FIELD_STRING_IP;
> + if (is_ret_probe(tu)) {
> + fmt = "(%lx <- %lx)";
> + arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
> + } else {
> + fmt = "(%lx)";
> + arg = "REC->" FIELD_STRING_IP;
> + }
>
> /* When len=0, we just calculate the needed length */
>
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-07 14:21:31

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 5/6] uprobes/tracing: Make seq_printf() code uretprobe-friendly

* Oleg Nesterov <[email protected]> [2013-04-01 18:08:57]:

> Change probes_seq_show() and print_uprobe_event() to check
> is_ret_probe() and print the correct data.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 17 +++++++++++++----
> 1 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index f75e52d..1b3622a 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -434,9 +434,10 @@ static void probes_seq_stop(struct seq_file *m, void *v)
> static int probes_seq_show(struct seq_file *m, void *v)
> {
> struct trace_uprobe *tu = v;
> + char c = is_ret_probe(tu) ? 'r' : 'p';
> int i;
>
> - seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
> + seq_printf(m, "%c:%s/%s", c, tu->call.class->system, tu->call.name);
> seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);
>
> for (i = 0; i < tu->nr_args; i++)
> @@ -569,10 +570,18 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
> entry = (struct uprobe_trace_entry_head *)iter->ent;
> tu = container_of(event, struct trace_uprobe, call.event);
>
> - if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
> - goto partial;
> + if (is_ret_probe(tu)) {
> + if (!trace_seq_printf(s, "%s: (0x%lx <- 0x%lx)", tu->call.name,
> + entry->vaddr[1], entry->vaddr[0]))
> + goto partial;
> + data = DATAOF_TRACE_ENTRY(entry, 2);
> + } else {
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name,
> + entry->vaddr[0]))
> + goto partial;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> + }
>
> - data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++) {
> if (!tu->args[i].type->print(s, tu->args[i].name,
> data + tu->args[i].offset, entry))
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-07 14:23:09

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 6/6] uprobes/tracing: Change create_trace_uprobe() to support uretprobes

* Oleg Nesterov <[email protected]> [2013-04-01 18:09:00]:

> Finally change create_trace_uprobe() to check if argv[0][0] == 'r'
> and pass the correct "is_ret" to alloc_trace_uprobe().
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 11 +++++++----
> 1 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 1b3622a..2773d2a 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -200,7 +200,7 @@ end:
>
> /*
> * Argument syntax:
> - * - Add uprobe: p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS]
> + * - Add uprobe: p|r[:[GRP/]EVENT] PATH:SYMBOL [FETCHARGS]
> *
> * - Remove uprobe: -:[GRP/]EVENT
> */
> @@ -212,20 +212,23 @@ static int create_trace_uprobe(int argc, char **argv)
> char buf[MAX_EVENT_NAME_LEN];
> struct path path;
> unsigned long offset;
> - bool is_delete;
> + bool is_delete, is_return;
> int i, ret;
>
> inode = NULL;
> ret = 0;
> is_delete = false;
> + is_return = false;
> event = NULL;
> group = NULL;
>
> /* argc must be >= 1 */
> if (argv[0][0] == '-')
> is_delete = true;
> + else if (argv[0][0] == 'r')
> + is_return = true;
> else if (argv[0][0] != 'p') {
> - pr_info("Probe definition must be started with 'p' or '-'.\n");
> + pr_info("Probe definition must be started with 'p', 'r' or '-'.\n");
> return -EINVAL;
> }
>
> @@ -323,7 +326,7 @@ static int create_trace_uprobe(int argc, char **argv)
> kfree(tail);
> }
>
> - tu = alloc_trace_uprobe(group, event, argc, false);
> + tu = alloc_trace_uprobe(group, event, argc, is_return);
> if (IS_ERR(tu)) {
> pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu));
> ret = PTR_ERR(tu);
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

Subject: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

(2013/04/06 0:01), Oleg Nesterov wrote:
> On 04/05, Masami Hiramatsu wrote:
>>
>> Acked-by: Masami Hiramatsu <[email protected]>
>
> Thanks!
>
> Masami, perhaps you can also answer the question I asked in 0/4
> marc.info/?l=linux-kernel&m=136458107403835 ?
>
> Off-topic question... Why uprobe_perf_func() passes "addr = ip" to
> perf_trace_buf_submit() ? This just sets perf_sample_data->addr for
> PERF_SAMPLE_ADDR, do we really need this? and we already have
> perf_sample_data->ip initialized by perf.
>
> kprobe_perf_func() and kretprobe_perf_func() do the same.
>

Good catch! I guess that I might misunderstood that it was used
for sampling execution address. It should be replaced with (u64)0,
as perf_trace_##call() does.

> Once again, I am just curious and this is completely offtopic.

Thank you :)

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-04-08 15:55:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

On Fri, 2013-03-29 at 19:15 +0100, Oleg Nesterov wrote:
> struct uprobe_trace_entry_head has a single member for reporting,
> "unsigned long ip". If we want to support uretprobes we need to
> create another struct which has "func" and "ret_ip" and duplicate
> a lot of functions, like trace_kprobe.c does.
>
> To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
> and add couple of trivial helpers to calculate sizeof/data. This
> uglifies the code a bit, but this allows us to avoid a lot more
> complications later, when we add the support for ret-probes.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace.h | 5 ---
> kernel/trace/trace_uprobe.c | 61 ++++++++++++++++++++++++------------------
> 2 files changed, 35 insertions(+), 31 deletions(-)
>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 57d7e53..6ca57cf 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -103,11 +103,6 @@ struct kretprobe_trace_entry_head {
> unsigned long ret_ip;
> };
>
> -struct uprobe_trace_entry_head {
> - struct trace_entry ent;
> - unsigned long ip;
> -};
> -
> /*
> * trace_flag_type is an enumeration that holds different
> * states when a trace occurs. These are:
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 43d258d..92a4b08 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -28,6 +28,17 @@
>
> #define UPROBE_EVENT_SYSTEM "uprobes"
>
> +struct uprobe_trace_entry_head {
> + struct trace_entry ent;
> + unsigned long vaddr[];
> +};
> +
> +#define SIZEOF_TRACE_ENTRY(nr) \
> + (sizeof(struct uprobe_trace_entry_head) + sizeof(unsigned long) * (nr))
> +
> +#define DATAOF_TRACE_ENTRY(entry, nr) \
> + ((void*)&(entry)->vaddr[nr])
> +
> struct trace_uprobe_filter {
> rwlock_t rwlock;
> int nr_systemwide;
> @@ -491,20 +502,19 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct uprobe_trace_entry_head *entry;
> struct ring_buffer_event *event;
> struct ring_buffer *buffer;
> - u8 *data;
> + void *data;
> int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - size = sizeof(*entry) + tu->size;
> -
> + size = SIZEOF_TRACE_ENTRY(1) + tu->size;

That '1' is confusing. When I first looked at this code, I thought it
was a bug as it should have been '0' (thinking of arrays). And then I
realized that you want the entry *after* the element.

Instead of '1' and I assume '2' for ret probes, how about defining an
enum:

enum {
UPROBE_NORM = 1,
UPROBE_RET = 2,
};

and then you can have;

size = SIZEOF_TRACE_ENTRY(UPROBE_NORM);

and later:

size = SIZEOF_TRACE_ENTRY(UPROBE_RET);

Same goes for DATAOF_TRACE_ENTRY().

This would make it a lot easier to understand and review, and much less
bug prone when we have to deal with two different types of
uprobe_trace_entry_head's.

-- Steve

> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> size, 0, 0);
> if (!event)
> return 0;
>
> entry = ring_buffer_event_data(event);
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + entry->vaddr[0] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -518,22 +528,22 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> static enum print_line_t
> print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
> {
> - struct uprobe_trace_entry_head *field;
> + struct uprobe_trace_entry_head *entry;
> struct trace_seq *s = &iter->seq;
> struct trace_uprobe *tu;
> u8 *data;
> int i;
>
> - field = (struct uprobe_trace_entry_head *)iter->ent;
> + entry = (struct uprobe_trace_entry_head *)iter->ent;
> tu = container_of(event, struct trace_uprobe, call.event);
>
> - if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
> + if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
> goto partial;
>
> - data = (u8 *)&field[1];
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++) {
> if (!tu->args[i].type->print(s, tu->args[i].name,
> - data + tu->args[i].offset, field))
> + data + tu->args[i].offset, entry))
> goto partial;
> }
>
> @@ -585,16 +595,17 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)
>
> static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
> {
> - int ret, i;
> + int ret, i, size;
> struct uprobe_trace_entry_head field;
> - struct trace_uprobe *tu = (struct trace_uprobe *)event_call->data;
> + struct trace_uprobe *tu = event_call->data;
>
> - DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
> + DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
> + size = SIZEOF_TRACE_ENTRY(1);
> /* Set argument names as fields */
> for (i = 0; i < tu->nr_args; i++) {
> ret = trace_define_field(event_call, tu->args[i].type->fmttype,
> tu->args[i].name,
> - sizeof(field) + tu->args[i].offset,
> + size + tu->args[i].offset,
> tu->args[i].type->size,
> tu->args[i].type->is_signed,
> FILTER_OTHER);
> @@ -748,33 +759,31 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct ftrace_event_call *call = &tu->call;
> struct uprobe_trace_entry_head *entry;
> struct hlist_head *head;
> - u8 *data;
> - int size, __size, i;
> - int rctx;
> + unsigned long ip;
> + void *data;
> + int size, rctx, i;
>
> if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> return UPROBE_HANDLER_REMOVE;
>
> - __size = sizeof(*entry) + tu->size;
> - size = ALIGN(__size + sizeof(u32), sizeof(u64));
> - size -= sizeof(u32);
> + size = SIZEOF_TRACE_ENTRY(1);
> + size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> return 0;
>
> preempt_disable();
> -
> entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
> if (!entry)
> goto out;
>
> - entry->ip = instruction_pointer(regs);
> - data = (u8 *)&entry[1];
> + ip = instruction_pointer(regs);
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> head = this_cpu_ptr(call->perf_events);
> - perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head, NULL);
> -
> + perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
> out:
> preempt_enable();
> return 0;
> @@ -784,7 +793,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> static
> int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type, void *data)
> {
> - struct trace_uprobe *tu = (struct trace_uprobe *)event->data;
> + struct trace_uprobe *tu = event->data;
>
> switch (type) {
> case TRACE_REG_REGISTER:

2013-04-08 15:58:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

On Fri, 2013-03-29 at 19:15 +0100, Oleg Nesterov wrote:
> uprobe_trace_func() is never called with irqs or preemption
> disabled, no need to ask preempt_count() or local_save_flags().
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 10 +++-------
> 1 files changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 8e00901..43d258d 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> struct ring_buffer_event *event;
> struct ring_buffer *buffer;
> u8 *data;
> - int size, i, pc;
> - unsigned long irq_flags;
> + int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - local_save_flags(irq_flags);
> - pc = preempt_count();

How about instead, just change the above two and have:

/* uprobes are never called with preemption disabled */
pc = 0;
irq_flags = 0;

and leave the rest the same. This will help in future reviewers of the
code to not have to look up what that "0, 0" is for, and then wonder if
it should be that way. gcc should optimize it to be exactly the same as
this patch.

-- Steve

> -
> size = sizeof(*entry) + tu->size;
>
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> - size, irq_flags, pc);
> + size, 0, 0);
> if (!event)
> return 0;
>
> @@ -513,7 +509,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> if (!filter_current_check_discard(buffer, call, entry, event))
> - trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> + trace_buffer_unlock_commit(buffer, event, 0, 0);
>
> return 0;
> }

2013-04-08 17:08:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/6] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

On Mon, 2013-04-01 at 18:08 +0200, Oleg Nesterov wrote:
> Change uprobe_trace_print() and uprobe_perf_print() to check
> is_ret_probe() and fill ring_buffer_event accordingly.
>
> Also change uprobe_trace_func() and uprobe_perf_func() to not
> _print() if is_ret_probe() is true. Note that we keep ->handler()
> nontrivial even for uretprobe, we need this for filtering and for
> other potential extensions.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 42 +++++++++++++++++++++++++++++++++---------
> 1 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index e91a354..db2718a 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -515,15 +515,26 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - size = SIZEOF_TRACE_ENTRY(1) + tu->size;
> + if (is_ret_probe(tu))
> + size = SIZEOF_TRACE_ENTRY(2);
> + else
> + size = SIZEOF_TRACE_ENTRY(1);

Again, having an enum for 1 and 2 would make this much more readable:

if (is_ret_probe(tu))
size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_RETPROBE);
else
size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_NORMAL);


> +
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> - size, 0, 0);
> + size + tu->size, 0, 0);
> if (!event)
> return;
>
> entry = ring_buffer_event_data(event);
> - entry->vaddr[0] = instruction_pointer(regs);
> - data = DATAOF_TRACE_ENTRY(entry, 1);
> + if (is_ret_probe(tu)) {
> + entry->vaddr[0] = func;
> + entry->vaddr[1] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 2);

data = DATAOF_TRACE_ENTRY(entry, UPROBE_ENTRY_RETPROBE);

> + } else {
> + entry->vaddr[0] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, 1);

data = DATAOF_TRACE_ENTRY(entry, UPROBE_ENTRY_NORMAL);

etc,

-- Steve

> + }
> +
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -534,7 +545,8 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> /* uprobe handler */
> static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> {
> - uprobe_trace_print(tu, 0, regs);
> + if (!is_ret_probe(tu))
> + uprobe_trace_print(tu, 0, regs);
> return 0;
> }
>
> @@ -783,7 +795,11 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> void *data;
> int size, rctx, i;
>
> - size = SIZEOF_TRACE_ENTRY(1);
> + if (is_ret_probe(tu))
> + size = SIZEOF_TRACE_ENTRY(2);
> + else
> + size = SIZEOF_TRACE_ENTRY(1);
> +
> size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> return;
> @@ -794,8 +810,15 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> goto out;
>
> ip = instruction_pointer(regs);
> - entry->vaddr[0] = ip;
> - data = DATAOF_TRACE_ENTRY(entry, 1);
> + if (is_ret_probe(tu)) {
> + entry->vaddr[0] = func;
> + entry->vaddr[1] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 2);
> + } else {
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, 1);
> + }
> +
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -811,7 +834,8 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> return UPROBE_HANDLER_REMOVE;
>
> - uprobe_perf_print(tu, 0, regs);
> + if (!is_ret_probe(tu))
> + uprobe_perf_print(tu, 0, regs);
> return 0;
> }
>

2013-04-09 13:41:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/6] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

On 04/07, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <[email protected]> [2013-04-01 18:08:51]:
>
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index e91a354..db2718a 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -515,15 +515,26 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> > int size, i;
> > struct ftrace_event_call *call = &tu->call;
> >
> > - size = SIZEOF_TRACE_ENTRY(1) + tu->size;
> > + if (is_ret_probe(tu))
>
> One nit:
> Here and couple of places below .. we could check for func instead of
> is_ret_probe() right?

Yes we could. And note that we do not really need both uprobe_trace_func()
and uretprobe_perf_func(), we could use a single function and check "func".

But:

> Or is there an advantage of checking is_ret_probe() over func?

I believe yes. Firstly, we can't use 0ul as "invalid func address" to detect
the !is_ret_probe() case, we need, say, -1ul which probably needs a symbolic
name. In fact, I'd prefer to add another "is_return" argument if we want to
avoid is_ret_probe() and unify 2 functions.

But more importantly, I think that is_ret_probe() is much more grep-friendly
and thus more understandable and consistent with other checks which can not
rely on "func".

> > static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> > {
> > - uprobe_trace_print(tu, 0, regs);
> > + if (!is_ret_probe(tu))
> > + uprobe_trace_print(tu, 0, regs);
>
> Should this hunk be in the previous patch?

Well, I dunno. Even if this hunk goes into the previous patch it won't
make the "print" logic correct until we change uprobe_trace_print(), iow
to me this logically connects to uprobe_trace_print() changed by this patch.

And correctness-wise this doesn't matter, until 6/6 make is_ret_probe() == T
possible we should not worry about the "missed" is_ret_probe() checks.

> Also something for the future:
> Most times a user uses a return probe, the user probably wants to probe
> the function entry too. So should we extend the abi from p+r to
> p+r+..<something else> to mean it traces both function entry and return.
> Esp given that uretprobe has been elegantly been designed to make this a
> possibility.

Oh, perhaps, but this is really for the future. In particular, it is not
clear how we can specify normal-fetchargs + ret-fetchargs.

Oleg.

2013-04-09 14:54:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

On 04/08, Steven Rostedt wrote:
>
> On Fri, 2013-03-29 at 19:15 +0100, Oleg Nesterov wrote:
> >
> > @@ -491,20 +502,19 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> > struct uprobe_trace_entry_head *entry;
> > struct ring_buffer_event *event;
> > struct ring_buffer *buffer;
> > - u8 *data;
> > + void *data;
> > int size, i;
> > struct ftrace_event_call *call = &tu->call;
> >
> > - size = sizeof(*entry) + tu->size;
> > -
> > + size = SIZEOF_TRACE_ENTRY(1) + tu->size;
>
> That '1' is confusing. When I first looked at this code, I thought it
> was a bug as it should have been '0' (thinking of arrays). And then I
> realized that you want the entry *after* the element.
>
> Instead of '1' and I assume '2' for ret probes, how about defining an
> enum:
>
> enum {
> UPROBE_NORM = 1,
> UPROBE_RET = 2,
> };
>
> and then you can have;
>
> size = SIZEOF_TRACE_ENTRY(UPROBE_NORM);
>
> and later:
>
> size = SIZEOF_TRACE_ENTRY(UPROBE_RET);
>
> Same goes for DATAOF_TRACE_ENTRY().
>
> This would make it a lot easier to understand and review, and much less
> bug prone when we have to deal with two different types of
> uprobe_trace_entry_head's.

OK, will do.

Or. Instead of enum we can use "bool is_return". So, instead of

if (is_ret_probe(tu))
size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_RETPROBE);
else
size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_NORMAL);

we can do

size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));

What do you like more?

Oleg.

2013-04-09 15:02:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

On 04/08, Steven Rostedt wrote:
>
> On Fri, 2013-03-29 at 19:15 +0100, Oleg Nesterov wrote:
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -492,17 +492,13 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> > struct ring_buffer_event *event;
> > struct ring_buffer *buffer;
> > u8 *data;
> > - int size, i, pc;
> > - unsigned long irq_flags;
> > + int size, i;
> > struct ftrace_event_call *call = &tu->call;
> >
> > - local_save_flags(irq_flags);
> > - pc = preempt_count();
>
> How about instead, just change the above two and have:
>
> /* uprobes are never called with preemption disabled */
> pc = 0;
> irq_flags = 0;
>
> and leave the rest the same. This will help in future reviewers of the
> code to not have to look up what that "0, 0" is for, and then wonder if
> it should be that way. gcc should optimize it to be exactly the same as
> this patch.

Hmm, just to remind which arguments trace_current_buffer_*() has?

Personally I disagree. And, for example, ftrace_syscall_enter/exit just
use 0,0 for the same reason.

So please tell me if you really want the dummy variables/arguments, in
this case I'll change this code even if I do not like it.

Oleg.

2013-04-09 15:07:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 4/4] uprobes/tracing: generalize struct uprobe_trace_entry_head

On Tue, 2013-04-09 at 16:50 +0200, Oleg Nesterov wrote:
> On 04/08, Steven Rostedt wrote:

> OK, will do.
>
> Or. Instead of enum we can use "bool is_return". So, instead of
>
> if (is_ret_probe(tu))
> size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_RETPROBE);
> else
> size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_NORMAL);
>
> we can do
>
> size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
>
> What do you like more?

Which ever is easier ;-)

I just hated the magic "1" and "2". As long as I (or any reviewer) does
not need to go searching for numbers, and can easily figure out what is
going on by looking at the code at hand, I'm happy.

Both the above satisfy that requirement.

Your "is_ret_probe(tu)" may have the added bonus of being less error
prone.

Thanks,

-- Steve

2013-04-09 15:12:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/4] uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls

On Tue, 2013-04-09 at 16:58 +0200, Oleg Nesterov wrote:

> Hmm, just to remind which arguments trace_current_buffer_*() has?
>
> Personally I disagree. And, for example, ftrace_syscall_enter/exit just
> use 0,0 for the same reason.
>
> So please tell me if you really want the dummy variables/arguments, in
> this case I'll change this code even if I do not like it.

I'm not attached to it, so if you really don't like it, then sure, go
ahead and scrap it. I was just fueled by the hatred of "1" and "2" in
your other patch that it carried over to this one ;-)

-- Steve

2013-04-09 19:35:08

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 1/7] uprobes/tracing: Generalize struct uprobe_trace_entry_head

struct uprobe_trace_entry_head has a single member for reporting,
"unsigned long ip". If we want to support uretprobes we need to
create another struct which has "func" and "ret_ip" and duplicate
a lot of functions, like trace_kprobe.c does.

To avoid this copy-and-paste horror we turn ->ip into ->vaddr[]
and add couple of trivial helpers to calculate sizeof/data. This
uglifies the code a bit, but this allows us to avoid a lot more
complications later, when we add the support for ret-probes.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace.h | 5 ---
kernel/trace/trace_uprobe.c | 62 +++++++++++++++++++++++++------------------
2 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 57d7e53..6ca57cf 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -103,11 +103,6 @@ struct kretprobe_trace_entry_head {
unsigned long ret_ip;
};

-struct uprobe_trace_entry_head {
- struct trace_entry ent;
- unsigned long ip;
-};
-
/*
* trace_flag_type is an enumeration that holds different
* states when a trace occurs. These are:
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 43d258d..49b4003 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -28,6 +28,18 @@

#define UPROBE_EVENT_SYSTEM "uprobes"

+struct uprobe_trace_entry_head {
+ struct trace_entry ent;
+ unsigned long vaddr[];
+};
+
+#define SIZEOF_TRACE_ENTRY(is_return) \
+ (sizeof(struct uprobe_trace_entry_head) + \
+ sizeof(unsigned long) * (is_return ? 2 : 1))
+
+#define DATAOF_TRACE_ENTRY(entry, is_return) \
+ ((void*)(entry) + SIZEOF_TRACE_ENTRY(is_return))
+
struct trace_uprobe_filter {
rwlock_t rwlock;
int nr_systemwide;
@@ -491,20 +503,19 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
struct uprobe_trace_entry_head *entry;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
- u8 *data;
+ void *data;
int size, i;
struct ftrace_event_call *call = &tu->call;

- size = sizeof(*entry) + tu->size;
-
+ size = SIZEOF_TRACE_ENTRY(false) + tu->size;
event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
size, 0, 0);
if (!event)
return 0;

entry = ring_buffer_event_data(event);
- entry->ip = instruction_pointer(regs);
- data = (u8 *)&entry[1];
+ entry->vaddr[0] = instruction_pointer(regs);
+ data = DATAOF_TRACE_ENTRY(entry, false);
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

@@ -518,22 +529,22 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
static enum print_line_t
print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
{
- struct uprobe_trace_entry_head *field;
+ struct uprobe_trace_entry_head *entry;
struct trace_seq *s = &iter->seq;
struct trace_uprobe *tu;
u8 *data;
int i;

- field = (struct uprobe_trace_entry_head *)iter->ent;
+ entry = (struct uprobe_trace_entry_head *)iter->ent;
tu = container_of(event, struct trace_uprobe, call.event);

- if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, field->ip))
+ if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
goto partial;

- data = (u8 *)&field[1];
+ data = DATAOF_TRACE_ENTRY(entry, false);
for (i = 0; i < tu->nr_args; i++) {
if (!tu->args[i].type->print(s, tu->args[i].name,
- data + tu->args[i].offset, field))
+ data + tu->args[i].offset, entry))
goto partial;
}

@@ -585,16 +596,17 @@ static void probe_event_disable(struct trace_uprobe *tu, int flag)

static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
{
- int ret, i;
+ int ret, i, size;
struct uprobe_trace_entry_head field;
- struct trace_uprobe *tu = (struct trace_uprobe *)event_call->data;
+ struct trace_uprobe *tu = event_call->data;

- DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
+ DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
+ size = SIZEOF_TRACE_ENTRY(false);
/* Set argument names as fields */
for (i = 0; i < tu->nr_args; i++) {
ret = trace_define_field(event_call, tu->args[i].type->fmttype,
tu->args[i].name,
- sizeof(field) + tu->args[i].offset,
+ size + tu->args[i].offset,
tu->args[i].type->size,
tu->args[i].type->is_signed,
FILTER_OTHER);
@@ -748,33 +760,31 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
struct ftrace_event_call *call = &tu->call;
struct uprobe_trace_entry_head *entry;
struct hlist_head *head;
- u8 *data;
- int size, __size, i;
- int rctx;
+ unsigned long ip;
+ void *data;
+ int size, rctx, i;

if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
return UPROBE_HANDLER_REMOVE;

- __size = sizeof(*entry) + tu->size;
- size = ALIGN(__size + sizeof(u32), sizeof(u64));
- size -= sizeof(u32);
+ size = SIZEOF_TRACE_ENTRY(false);
+ size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
return 0;

preempt_disable();
-
entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
if (!entry)
goto out;

- entry->ip = instruction_pointer(regs);
- data = (u8 *)&entry[1];
+ ip = instruction_pointer(regs);
+ entry->vaddr[0] = ip;
+ data = DATAOF_TRACE_ENTRY(entry, false);
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

head = this_cpu_ptr(call->perf_events);
- perf_trace_buf_submit(entry, size, rctx, entry->ip, 1, regs, head, NULL);
-
+ perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
out:
preempt_enable();
return 0;
@@ -784,7 +794,7 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
static
int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type, void *data)
{
- struct trace_uprobe *tu = (struct trace_uprobe *)event->data;
+ struct trace_uprobe *tu = event->data;

switch (type) {
case TRACE_REG_REGISTER:
--
1.5.5.1

2013-04-09 19:35:18

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 3/7] uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()

Create the new functions we need to support uretprobes, and change
alloc_trace_uprobe() to initialize consumer.ret_handler if the new
"is_ret" argument is true. Curently this argument is always false,
so the new code is never called and is_ret_probe(tu) is false too.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace_uprobe.c | 42 ++++++++++++++++++++++++++++++++++++++++--
1 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 0c0f0a7..72aa45e 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -76,6 +76,8 @@ static DEFINE_MUTEX(uprobe_lock);
static LIST_HEAD(uprobe_list);

static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs);
+static int uretprobe_dispatcher(struct uprobe_consumer *con,
+ unsigned long func, struct pt_regs *regs);

static inline void init_trace_uprobe_filter(struct trace_uprobe_filter *filter)
{
@@ -89,11 +91,16 @@ static inline bool uprobe_filter_is_empty(struct trace_uprobe_filter *filter)
return !filter->nr_systemwide && list_empty(&filter->perf_events);
}

+static inline bool is_ret_probe(struct trace_uprobe *tu)
+{
+ return tu->consumer.ret_handler != NULL;
+}
+
/*
* Allocate new trace_uprobe and initialize it (including uprobes).
*/
static struct trace_uprobe *
-alloc_trace_uprobe(const char *group, const char *event, int nargs)
+alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
{
struct trace_uprobe *tu;

@@ -118,6 +125,8 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs)

INIT_LIST_HEAD(&tu->list);
tu->consumer.handler = uprobe_dispatcher;
+ if (is_ret)
+ tu->consumer.ret_handler = uretprobe_dispatcher;
init_trace_uprobe_filter(&tu->filter);
return tu;

@@ -315,7 +324,7 @@ static int create_trace_uprobe(int argc, char **argv)
kfree(tail);
}

- tu = alloc_trace_uprobe(group, event, argc);
+ tu = alloc_trace_uprobe(group, event, argc, false);
if (IS_ERR(tu)) {
pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu));
ret = PTR_ERR(tu);
@@ -530,6 +539,12 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
return 0;
}

+static void uretprobe_trace_func(struct trace_uprobe *tu, unsigned long func,
+ struct pt_regs *regs)
+{
+ uprobe_trace_print(tu, func, regs);
+}
+
/* Event entry printers */
static enum print_line_t
print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *event)
@@ -800,6 +815,12 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
uprobe_perf_print(tu, 0, regs);
return 0;
}
+
+static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func,
+ struct pt_regs *regs)
+{
+ uprobe_perf_print(tu, func, regs);
+}
#endif /* CONFIG_PERF_EVENTS */

static
@@ -854,6 +875,23 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
return ret;
}

+static int uretprobe_dispatcher(struct uprobe_consumer *con,
+ unsigned long func, struct pt_regs *regs)
+{
+ struct trace_uprobe *tu;
+
+ tu = container_of(con, struct trace_uprobe, consumer);
+
+ if (tu->flags & TP_FLAG_TRACE)
+ uretprobe_trace_func(tu, func, regs);
+
+#ifdef CONFIG_PERF_EVENTS
+ if (tu->flags & TP_FLAG_PROFILE)
+ uretprobe_perf_func(tu, func, regs);
+#endif
+ return 0;
+}
+
static struct trace_event_functions uprobe_funcs = {
.trace = print_uprobe_event
};
--
1.5.5.1

2013-04-09 19:35:33

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 5/7] uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly

Change uprobe_event_define_fields(), and __set_print_fmt() to check
is_ret_probe() and use the appropriate format/fields.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace_uprobe.c | 19 +++++++++++++++----
1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 0ed99a2..4575762 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -628,8 +628,14 @@ static int uprobe_event_define_fields(struct ftrace_event_call *event_call)
struct uprobe_trace_entry_head field;
struct trace_uprobe *tu = event_call->data;

- DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
- size = SIZEOF_TRACE_ENTRY(false);
+ if (is_ret_probe(tu)) {
+ DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_FUNC, 0);
+ DEFINE_FIELD(unsigned long, vaddr[1], FIELD_STRING_RETIP, 0);
+ size = SIZEOF_TRACE_ENTRY(true);
+ } else {
+ DEFINE_FIELD(unsigned long, vaddr[0], FIELD_STRING_IP, 0);
+ size = SIZEOF_TRACE_ENTRY(false);
+ }
/* Set argument names as fields */
for (i = 0; i < tu->nr_args; i++) {
ret = trace_define_field(event_call, tu->args[i].type->fmttype,
@@ -652,8 +658,13 @@ static int __set_print_fmt(struct trace_uprobe *tu, char *buf, int len)
int i;
int pos = 0;

- fmt = "(%lx)";
- arg = "REC->" FIELD_STRING_IP;
+ if (is_ret_probe(tu)) {
+ fmt = "(%lx <- %lx)";
+ arg = "REC->" FIELD_STRING_FUNC ", REC->" FIELD_STRING_RETIP;
+ } else {
+ fmt = "(%lx)";
+ arg = "REC->" FIELD_STRING_IP;
+ }

/* When len=0, we just calculate the needed length */

--
1.5.5.1

2013-04-09 19:35:53

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 4/7] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

Change uprobe_trace_print() and uprobe_perf_print() to check
is_ret_probe() and fill ring_buffer_event accordingly.

Also change uprobe_trace_func() and uprobe_perf_func() to not
_print() if is_ret_probe() is true. Note that we keep ->handler()
nontrivial even for uretprobe, we need this for filtering and for
other potential extensions.

Signed-off-by: Oleg Nesterov <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace_uprobe.c | 34 +++++++++++++++++++++++++---------
1 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 72aa45e..0ed99a2 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -516,15 +516,22 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
int size, i;
struct ftrace_event_call *call = &tu->call;

- size = SIZEOF_TRACE_ENTRY(false) + tu->size;
+ size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
- size, 0, 0);
+ size + tu->size, 0, 0);
if (!event)
return;

entry = ring_buffer_event_data(event);
- entry->vaddr[0] = instruction_pointer(regs);
- data = DATAOF_TRACE_ENTRY(entry, false);
+ if (is_ret_probe(tu)) {
+ entry->vaddr[0] = func;
+ entry->vaddr[1] = instruction_pointer(regs);
+ data = DATAOF_TRACE_ENTRY(entry, true);
+ } else {
+ entry->vaddr[0] = instruction_pointer(regs);
+ data = DATAOF_TRACE_ENTRY(entry, false);
+ }
+
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

@@ -535,7 +542,8 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
/* uprobe handler */
static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
{
- uprobe_trace_print(tu, 0, regs);
+ if (!is_ret_probe(tu))
+ uprobe_trace_print(tu, 0, regs);
return 0;
}

@@ -784,7 +792,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
void *data;
int size, rctx, i;

- size = SIZEOF_TRACE_ENTRY(false);
+ size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
return;
@@ -795,8 +803,15 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
goto out;

ip = instruction_pointer(regs);
- entry->vaddr[0] = ip;
- data = DATAOF_TRACE_ENTRY(entry, false);
+ if (is_ret_probe(tu)) {
+ entry->vaddr[0] = func;
+ entry->vaddr[1] = ip;
+ data = DATAOF_TRACE_ENTRY(entry, true);
+ } else {
+ entry->vaddr[0] = ip;
+ data = DATAOF_TRACE_ENTRY(entry, false);
+ }
+
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

@@ -812,7 +827,8 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
return UPROBE_HANDLER_REMOVE;

- uprobe_perf_print(tu, 0, regs);
+ if (!is_ret_probe(tu))
+ uprobe_perf_print(tu, 0, regs);
return 0;
}

--
1.5.5.1

2013-04-09 19:36:07

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 7/7] uprobes/tracing: Change create_trace_uprobe() to support uretprobes

Finally change create_trace_uprobe() to check if argv[0][0] == 'r'
and pass the correct "is_ret" to alloc_trace_uprobe().

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace_uprobe.c | 11 +++++++----
1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8c9f489..2d08bea 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -201,7 +201,7 @@ end:

/*
* Argument syntax:
- * - Add uprobe: p[:[GRP/]EVENT] PATH:SYMBOL[+offs] [FETCHARGS]
+ * - Add uprobe: p|r[:[GRP/]EVENT] PATH:SYMBOL [FETCHARGS]
*
* - Remove uprobe: -:[GRP/]EVENT
*/
@@ -213,20 +213,23 @@ static int create_trace_uprobe(int argc, char **argv)
char buf[MAX_EVENT_NAME_LEN];
struct path path;
unsigned long offset;
- bool is_delete;
+ bool is_delete, is_return;
int i, ret;

inode = NULL;
ret = 0;
is_delete = false;
+ is_return = false;
event = NULL;
group = NULL;

/* argc must be >= 1 */
if (argv[0][0] == '-')
is_delete = true;
+ else if (argv[0][0] == 'r')
+ is_return = true;
else if (argv[0][0] != 'p') {
- pr_info("Probe definition must be started with 'p' or '-'.\n");
+ pr_info("Probe definition must be started with 'p', 'r' or '-'.\n");
return -EINVAL;
}

@@ -324,7 +327,7 @@ static int create_trace_uprobe(int argc, char **argv)
kfree(tail);
}

- tu = alloc_trace_uprobe(group, event, argc, false);
+ tu = alloc_trace_uprobe(group, event, argc, is_return);
if (IS_ERR(tu)) {
pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu));
ret = PTR_ERR(tu);
--
1.5.5.1

2013-04-09 19:35:29

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 0/7] uprobes/tracing: uretprobes

On 04/09, Steven Rostedt wrote:
>
> On Tue, 2013-04-09 at 16:50 +0200, Oleg Nesterov wrote:
> > On 04/08, Steven Rostedt wrote:
>
> > OK, will do.
> >
> > Or. Instead of enum we can use "bool is_return". So, instead of
> >
> > if (is_ret_probe(tu))
> > size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_RETPROBE);
> > else
> > size = SIZEOF_TRACE_ENTRY(UPROBE_ENTRY_NORMAL);
> >
> > we can do
> >
> > size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> >
> > What do you like more?
>
> Which ever is easier ;-)
>
> I just hated the magic "1" and "2". As long as I (or any reviewer) does
> not need to go searching for numbers, and can easily figure out what is
> going on by looking at the code at hand, I'm happy.
>
> Both the above satisfy that requirement.
>
> Your "is_ret_probe(tu)" may have the added bonus of being less error
> prone.

OK, please see v2.

Change SIZEOF_TRACE_ENTRY/DATAOF_TRACE_ENTRY to accept "bool is_return"
rather than "int nr".

Srikar, I preserved your acks, hopefully this is fine. But 4/7 still
doesn't have your ack.

Oleg.

2013-04-09 19:35:27

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 6/7] uprobes/tracing: Make seq_printf() code uretprobe-friendly

Change probes_seq_show() and print_uprobe_event() to check
is_ret_probe() and print the correct data.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace_uprobe.c | 17 +++++++++++++----
1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 4575762..8c9f489 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -435,9 +435,10 @@ static void probes_seq_stop(struct seq_file *m, void *v)
static int probes_seq_show(struct seq_file *m, void *v)
{
struct trace_uprobe *tu = v;
+ char c = is_ret_probe(tu) ? 'r' : 'p';
int i;

- seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
+ seq_printf(m, "%c:%s/%s", c, tu->call.class->system, tu->call.name);
seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);

for (i = 0; i < tu->nr_args; i++)
@@ -566,10 +567,18 @@ print_uprobe_event(struct trace_iterator *iter, int flags, struct trace_event *e
entry = (struct uprobe_trace_entry_head *)iter->ent;
tu = container_of(event, struct trace_uprobe, call.event);

- if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name, entry->vaddr[0]))
- goto partial;
+ if (is_ret_probe(tu)) {
+ if (!trace_seq_printf(s, "%s: (0x%lx <- 0x%lx)", tu->call.name,
+ entry->vaddr[1], entry->vaddr[0]))
+ goto partial;
+ data = DATAOF_TRACE_ENTRY(entry, true);
+ } else {
+ if (!trace_seq_printf(s, "%s: (0x%lx)", tu->call.name,
+ entry->vaddr[0]))
+ goto partial;
+ data = DATAOF_TRACE_ENTRY(entry, false);
+ }

- data = DATAOF_TRACE_ENTRY(entry, false);
for (i = 0; i < tu->nr_args; i++) {
if (!tu->args[i].type->print(s, tu->args[i].name,
data + tu->args[i].offset, entry))
--
1.5.5.1

2013-04-09 19:35:16

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 2/7] uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers

Extract the output code from uprobe_trace_func() and uprobe_perf_func()
into the new helpers, they will be used by ->ret_handler() too. We also
add the unused "unsigned long func" argument in advance, to simplify the
next changes.

Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
Tested-by: Anton Arapov <[email protected]>
---
kernel/trace/trace_uprobe.c | 29 ++++++++++++++++++++---------
1 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 49b4003..0c0f0a7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -497,8 +497,8 @@ static const struct file_operations uprobe_profile_ops = {
.release = seq_release,
};

-/* uprobe handler */
-static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
+static void uprobe_trace_print(struct trace_uprobe *tu,
+ unsigned long func, struct pt_regs *regs)
{
struct uprobe_trace_entry_head *entry;
struct ring_buffer_event *event;
@@ -511,7 +511,7 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
size, 0, 0);
if (!event)
- return 0;
+ return;

entry = ring_buffer_event_data(event);
entry->vaddr[0] = instruction_pointer(regs);
@@ -521,7 +521,12 @@ static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)

if (!filter_current_check_discard(buffer, call, entry, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
+}

+/* uprobe handler */
+static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
+{
+ uprobe_trace_print(tu, 0, regs);
return 0;
}

@@ -754,8 +759,8 @@ static bool uprobe_perf_filter(struct uprobe_consumer *uc,
return ret;
}

-/* uprobe profile handler */
-static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
+static void uprobe_perf_print(struct trace_uprobe *tu,
+ unsigned long func, struct pt_regs *regs)
{
struct ftrace_event_call *call = &tu->call;
struct uprobe_trace_entry_head *entry;
@@ -764,13 +769,10 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
void *data;
int size, rctx, i;

- if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
- return UPROBE_HANDLER_REMOVE;
-
size = SIZEOF_TRACE_ENTRY(false);
size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
- return 0;
+ return;

preempt_disable();
entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
@@ -787,6 +789,15 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
out:
preempt_enable();
+}
+
+/* uprobe profile handler */
+static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
+{
+ if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
+ return UPROBE_HANDLER_REMOVE;
+
+ uprobe_perf_print(tu, 0, regs);
return 0;
}
#endif /* CONFIG_PERF_EVENTS */
--
1.5.5.1

2013-04-10 15:06:27

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

uprobe_perf_print() passes addr=ip to perf_trace_buf_submit() for
no reason. This sets perf_sample_data->addr for PERF_SAMPLE_ADDR,
we already have perf_sample_data->ip initialized if PERF_SAMPLE_IP.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 2d08bea..37ccb72 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -811,7 +811,6 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
struct ftrace_event_call *call = &tu->call;
struct uprobe_trace_entry_head *entry;
struct hlist_head *head;
- unsigned long ip;
void *data;
int size, rctx, i;

@@ -825,13 +824,12 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
if (!entry)
goto out;

- ip = instruction_pointer(regs);
if (is_ret_probe(tu)) {
entry->vaddr[0] = func;
- entry->vaddr[1] = ip;
+ entry->vaddr[1] = instruction_pointer(regs);
data = DATAOF_TRACE_ENTRY(entry, true);
} else {
- entry->vaddr[0] = ip;
+ entry->vaddr[0] = instruction_pointer(regs);
data = DATAOF_TRACE_ENTRY(entry, false);
}

@@ -839,7 +837,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

head = this_cpu_ptr(call->perf_events);
- perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
+ perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
out:
preempt_enable();
}
--
1.5.5.1

2013-04-10 15:08:13

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

On 04/08, Masami Hiramatsu wrote:
>
> (2013/04/06 0:01), Oleg Nesterov wrote:
> >
> > Masami, perhaps you can also answer the question I asked in 0/4
> > marc.info/?l=linux-kernel&m=136458107403835 ?
> >
> > Off-topic question... Why uprobe_perf_func() passes "addr = ip" to
> > perf_trace_buf_submit() ? This just sets perf_sample_data->addr for
> > PERF_SAMPLE_ADDR, do we really need this? and we already have
> > perf_sample_data->ip initialized by perf.
> >
> > kprobe_perf_func() and kretprobe_perf_func() do the same.
> >
>
> Good catch! I guess that I might misunderstood that it was used
> for sampling execution address. It should be replaced with (u64)0,
> as perf_trace_##call() does.

Thanks!

How about this trivial cleanup then? If I have your ack I'll add this
patch to other pending changes.

And... Cough, another question ;) To simplify, lets discuss kprobe_perf_func()
only. Suppose that a task hits the kprobe but this task/cpu doesn't have
a counter. Can't we avoid perf_trace_buf_prepare/submit in this case?
IOW, what do you think about the change below?

Oleg.

--- x/kernel/trace/trace_kprobe.c
+++ x/kernel/trace/trace_kprobe.c
@@ -985,6 +985,10 @@ static __kprobes void kprobe_perf_func(s
int size, __size, dsize;
int rctx;

+ head = this_cpu_ptr(call->perf_events);
+ if (hlist_empty(head))
+ return;
+
dsize = __get_data_size(tp, regs);
__size = sizeof(*entry) + tp->size + dsize;
size = ALIGN(__size + sizeof(u32), sizeof(u64));
@@ -1001,7 +1005,6 @@ static __kprobes void kprobe_perf_func(s
memset(&entry[1], 0, dsize);
store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);

- head = this_cpu_ptr(call->perf_events);
perf_trace_buf_submit(entry, size, rctx,
entry->ip, 1, regs, head, NULL);
}

Subject: Re: [PATCH 1/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

(2013/04/10 23:58), Oleg Nesterov wrote:
> uprobe_perf_print() passes addr=ip to perf_trace_buf_submit() for
> no reason. This sets perf_sample_data->addr for PERF_SAMPLE_ADDR,
> we already have perf_sample_data->ip initialized if PERF_SAMPLE_IP.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Masami Hiramatsu <[email protected]>

Thank you!

> ---
> kernel/trace/trace_uprobe.c | 8 +++-----
> 1 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 2d08bea..37ccb72 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -811,7 +811,6 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> struct ftrace_event_call *call = &tu->call;
> struct uprobe_trace_entry_head *entry;
> struct hlist_head *head;
> - unsigned long ip;
> void *data;
> int size, rctx, i;
>
> @@ -825,13 +824,12 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> if (!entry)
> goto out;
>
> - ip = instruction_pointer(regs);
> if (is_ret_probe(tu)) {
> entry->vaddr[0] = func;
> - entry->vaddr[1] = ip;
> + entry->vaddr[1] = instruction_pointer(regs);
> data = DATAOF_TRACE_ENTRY(entry, true);
> } else {
> - entry->vaddr[0] = ip;
> + entry->vaddr[0] = instruction_pointer(regs);
> data = DATAOF_TRACE_ENTRY(entry, false);
> }
>
> @@ -839,7 +837,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> head = this_cpu_ptr(call->perf_events);
> - perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
> + perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
> out:
> preempt_enable();
> }
>


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH 0/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

(2013/04/10 23:58), Oleg Nesterov wrote:
> On 04/08, Masami Hiramatsu wrote:
>>
>> (2013/04/06 0:01), Oleg Nesterov wrote:
>>>
>>> Masami, perhaps you can also answer the question I asked in 0/4
>>> marc.info/?l=linux-kernel&m=136458107403835 ?
>>>
>>> Off-topic question... Why uprobe_perf_func() passes "addr = ip" to
>>> perf_trace_buf_submit() ? This just sets perf_sample_data->addr for
>>> PERF_SAMPLE_ADDR, do we really need this? and we already have
>>> perf_sample_data->ip initialized by perf.
>>>
>>> kprobe_perf_func() and kretprobe_perf_func() do the same.
>>>
>>
>> Good catch! I guess that I might misunderstood that it was used
>> for sampling execution address. It should be replaced with (u64)0,
>> as perf_trace_##call() does.
>
> Thanks!
>
> How about this trivial cleanup then? If I have your ack I'll add this
> patch to other pending changes.

That is good for me :)

> And... Cough, another question ;) To simplify, lets discuss kprobe_perf_func()
> only. Suppose that a task hits the kprobe but this task/cpu doesn't have
> a counter. Can't we avoid perf_trace_buf_prepare/submit in this case?
> IOW, what do you think about the change below?

Hmm, I'm not so sure how frequently this happens. And, is this right way to
handle that case? If so, we can do same thing also on trace_events.
(perf_trace_##call in include/trace/ftrace.h)

Thank you,


--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-04-11 12:02:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

On 04/11, Masami Hiramatsu wrote:
>
> (2013/04/10 23:58), Oleg Nesterov wrote:
>
> > And... Cough, another question ;) To simplify, lets discuss kprobe_perf_func()
> > only. Suppose that a task hits the kprobe but this task/cpu doesn't have
> > a counter. Can't we avoid perf_trace_buf_prepare/submit in this case?
> > IOW, what do you think about the change below?
>
> Hmm, I'm not so sure how frequently this happens.

Suppose that you do, say, "perf record -e probe:some_func workload". Only
"workload" will have the active counter, any other task which hits the
probed some_func() will do perf_trace_buf_prepare/perf_trace_buf_submit
just to realize that nobody wants perf_swevent_event().

Simple test-case:

#include <unistd.h>

int main(void)
{
int n;

for (n = 0; n < 1000 * 1000; ++n)
getppid();

return 0;
}

Without kprobe:

# time ./ppid

real 0m0.663s
user 0m0.163s
sys 0m0.500s

Activate the probe:

# perf probe sys_getppid

# perf record -e probe:sys_getppid sleep 1000 &
[1] 546

Test it again 3 times:

# time ./ppid

Before the patch:

real 0m9.727s
user 0m0.177s
sys 0m9.547s

real 0m9.752s
user 0m0.180s
sys 0m9.573s

real 0m9.761s
user 0m0.187s
sys 0m9.573s

After the patch:

real 0m9.605s
user 0m0.163s
sys 0m9.437s

real 0m9.592s
user 0m0.167s
sys 0m9.423s

real 0m9.613s
user 0m0.183s
sys 0m9.427s

So the difference looks measurable but small, and I did the testing
under qemu so I do not really know if we can trust the numbers.

> And, is this right way to
> handle that case?

If only I was sure ;) I am asking.

And, to clarify, it is not that I think this change can really
improve the perfomance. Just I am trying to understand what I have
missed.

> If so, we can do same thing also on trace_events.
> (perf_trace_##call in include/trace/ftrace.h)

Yes, yes, this is not kprobe-specific. It seems that more users of
perf_trace_buf_submit() could be changed the same way.

Thanks,

Oleg.

2013-04-12 18:01:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

On Thu, 2013-04-11 at 13:59 +0200, Oleg Nesterov wrote:
> On 04/11, Masami Hiramatsu wrote:
> >
> > (2013/04/10 23:58), Oleg Nesterov wrote:
> >
> > > And... Cough, another question ;) To simplify, lets discuss kprobe_perf_func()
> > > only. Suppose that a task hits the kprobe but this task/cpu doesn't have
> > > a counter. Can't we avoid perf_trace_buf_prepare/submit in this case?
> > > IOW, what do you think about the change below?
> >
> > Hmm, I'm not so sure how frequently this happens.
>
> Suppose that you do, say, "perf record -e probe:some_func workload". Only
> "workload" will have the active counter, any other task which hits the
> probed some_func() will do perf_trace_buf_prepare/perf_trace_buf_submit
> just to realize that nobody wants perf_swevent_event().

Wow, you're right. Seems that perf goes through a lot of work for every
time a tracepoint is hit for *all tasks*.

>
> Simple test-case:
>
> #include <unistd.h>
>
> int main(void)
> {
> int n;
>
> for (n = 0; n < 1000 * 1000; ++n)
> getppid();
>
> return 0;
> }
>
> Without kprobe:
>
> # time ./ppid
>
> real 0m0.663s
> user 0m0.163s
> sys 0m0.500s
>
> Activate the probe:
>
> # perf probe sys_getppid
>
> # perf record -e probe:sys_getppid sleep 1000 &
> [1] 546
>
> Test it again 3 times:
>
> # time ./ppid
>
> Before the patch:
>
> real 0m9.727s
> user 0m0.177s
> sys 0m9.547s
>
> real 0m9.752s
> user 0m0.180s
> sys 0m9.573s
>
> real 0m9.761s
> user 0m0.187s
> sys 0m9.573s
>
> After the patch:
>
> real 0m9.605s
> user 0m0.163s
> sys 0m9.437s
>
> real 0m9.592s
> user 0m0.167s
> sys 0m9.423s
>
> real 0m9.613s
> user 0m0.183s
> sys 0m9.427s
>
> So the difference looks measurable but small, and I did the testing
> under qemu so I do not really know if we can trust the numbers.
>
> > And, is this right way to
> > handle that case?
>
> If only I was sure ;) I am asking.
>
> And, to clarify, it is not that I think this change can really
> improve the perfomance. Just I am trying to understand what I have
> missed.
>
> > If so, we can do same thing also on trace_events.
> > (perf_trace_##call in include/trace/ftrace.h)
>
> Yes, yes, this is not kprobe-specific. It seems that more users of
> perf_trace_buf_submit() could be changed the same way.

Yeah, looks like include/trace/ftrace.h needs an update.

Frederic?

-- Steve

2013-04-12 21:19:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

On Thu, 2013-04-11 at 13:59 +0200, Oleg Nesterov wrote:

> > If so, we can do same thing also on trace_events.
> > (perf_trace_##call in include/trace/ftrace.h)
>
> Yes, yes, this is not kprobe-specific. It seems that more users of
> perf_trace_buf_submit() could be changed the same way.
>

Oleg,

Can you make the necessary changes elsewhere? I talked with Frederic on
IRC and he's a bit busy with other work. But he did say he would review
changes that you make.

Thanks,

-- Steve

2013-04-13 09:34:17

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 1/1] uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()

* Oleg Nesterov <[email protected]> [2013-04-10 16:58:44]:

> uprobe_perf_print() passes addr=ip to perf_trace_buf_submit() for
> no reason. This sets perf_sample_data->addr for PERF_SAMPLE_ADDR,
> we already have perf_sample_data->ip initialized if PERF_SAMPLE_IP.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> ---
> kernel/trace/trace_uprobe.c | 8 +++-----
> 1 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 2d08bea..37ccb72 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -811,7 +811,6 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> struct ftrace_event_call *call = &tu->call;
> struct uprobe_trace_entry_head *entry;
> struct hlist_head *head;
> - unsigned long ip;
> void *data;
> int size, rctx, i;
>
> @@ -825,13 +824,12 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> if (!entry)
> goto out;
>
> - ip = instruction_pointer(regs);
> if (is_ret_probe(tu)) {
> entry->vaddr[0] = func;
> - entry->vaddr[1] = ip;
> + entry->vaddr[1] = instruction_pointer(regs);
> data = DATAOF_TRACE_ENTRY(entry, true);
> } else {
> - entry->vaddr[0] = ip;
> + entry->vaddr[0] = instruction_pointer(regs);
> data = DATAOF_TRACE_ENTRY(entry, false);
> }
>
> @@ -839,7 +837,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> head = this_cpu_ptr(call->perf_events);
> - perf_trace_buf_submit(entry, size, rctx, ip, 1, regs, head, NULL);
> + perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
> out:
> preempt_enable();
> }
> --
> 1.5.5.1
>
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-13 09:38:00

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH 3/6] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

* Oleg Nesterov <[email protected]> [2013-04-09 15:33:33]:

> On 04/07, Srikar Dronamraju wrote:
> >
> > * Oleg Nesterov <[email protected]> [2013-04-01 18:08:51]:
> >
> > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > index e91a354..db2718a 100644
> > > --- a/kernel/trace/trace_uprobe.c
> > > +++ b/kernel/trace/trace_uprobe.c
> > > @@ -515,15 +515,26 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> > > int size, i;
> > > struct ftrace_event_call *call = &tu->call;
> > >
> > > - size = SIZEOF_TRACE_ENTRY(1) + tu->size;
> > > + if (is_ret_probe(tu))
> >
> > One nit:
> > Here and couple of places below .. we could check for func instead of
> > is_ret_probe() right?
>
> Yes we could. And note that we do not really need both uprobe_trace_func()
> and uretprobe_perf_func(), we could use a single function and check "func".
>
> But:
>
> > Or is there an advantage of checking is_ret_probe() over func?
>
> I believe yes. Firstly, we can't use 0ul as "invalid func address" to detect
> the !is_ret_probe() case, we need, say, -1ul which probably needs a symbolic
> name. In fact, I'd prefer to add another "is_return" argument if we want to
> avoid is_ret_probe() and unify 2 functions.
>
> But more importantly, I think that is_ret_probe() is much more grep-friendly
> and thus more understandable and consistent with other checks which can not
> rely on "func".

Okay, Agree.

2013-04-13 09:39:08

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v2 4/7] uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly

* Oleg Nesterov <[email protected]> [2013-04-09 21:32:35]:

> Change uprobe_trace_print() and uprobe_perf_print() to check
> is_ret_probe() and fill ring_buffer_event accordingly.
>
> Also change uprobe_trace_func() and uprobe_perf_func() to not
> _print() if is_ret_probe() is true. Note that we keep ->handler()
> nontrivial even for uretprobe, we need this for filtering and for
> other potential extensions.
>
> Signed-off-by: Oleg Nesterov <[email protected]>

Acked-by: Srikar Dronamraju <[email protected]>

> Tested-by: Anton Arapov <[email protected]>
> ---
> kernel/trace/trace_uprobe.c | 34 +++++++++++++++++++++++++---------
> 1 files changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 72aa45e..0ed99a2 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -516,15 +516,22 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> int size, i;
> struct ftrace_event_call *call = &tu->call;
>
> - size = SIZEOF_TRACE_ENTRY(false) + tu->size;
> + size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> event = trace_current_buffer_lock_reserve(&buffer, call->event.type,
> - size, 0, 0);
> + size + tu->size, 0, 0);
> if (!event)
> return;
>
> entry = ring_buffer_event_data(event);
> - entry->vaddr[0] = instruction_pointer(regs);
> - data = DATAOF_TRACE_ENTRY(entry, false);
> + if (is_ret_probe(tu)) {
> + entry->vaddr[0] = func;
> + entry->vaddr[1] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, true);
> + } else {
> + entry->vaddr[0] = instruction_pointer(regs);
> + data = DATAOF_TRACE_ENTRY(entry, false);
> + }
> +
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -535,7 +542,8 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> /* uprobe handler */
> static int uprobe_trace_func(struct trace_uprobe *tu, struct pt_regs *regs)
> {
> - uprobe_trace_print(tu, 0, regs);
> + if (!is_ret_probe(tu))
> + uprobe_trace_print(tu, 0, regs);
> return 0;
> }
>
> @@ -784,7 +792,7 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> void *data;
> int size, rctx, i;
>
> - size = SIZEOF_TRACE_ENTRY(false);
> + size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
> size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
> if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
> return;
> @@ -795,8 +803,15 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
> goto out;
>
> ip = instruction_pointer(regs);
> - entry->vaddr[0] = ip;
> - data = DATAOF_TRACE_ENTRY(entry, false);
> + if (is_ret_probe(tu)) {
> + entry->vaddr[0] = func;
> + entry->vaddr[1] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, true);
> + } else {
> + entry->vaddr[0] = ip;
> + data = DATAOF_TRACE_ENTRY(entry, false);
> + }
> +
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> @@ -812,7 +827,8 @@ static int uprobe_perf_func(struct trace_uprobe *tu, struct pt_regs *regs)
> if (!uprobe_perf_filter(&tu->consumer, 0, current->mm))
> return UPROBE_HANDLER_REMOVE;
>
> - uprobe_perf_print(tu, 0, regs);
> + if (!is_ret_probe(tu))
> + uprobe_perf_print(tu, 0, regs);
> return 0;
> }
>
> --
> 1.5.5.1
>

--
Thanks and Regards
Srikar Dronamraju

2013-04-13 14:08:47

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 1/1] uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty

perf_trace_buf_prepare() + perf_trace_buf_submit() make no sense
if this task/CPU has no active counters. Change uprobe_perf_print()
to return if hlist_empty(call->perf_events).

Note: this is not uprobe-specific, we can change other users too.

Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/trace/trace_uprobe.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 37ccb72..32494fb 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -820,6 +820,10 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
return;

preempt_disable();
+ head = this_cpu_ptr(call->perf_events);
+ if (hlist_empty(head))
+ goto out;
+
entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
if (!entry)
goto out;
@@ -836,7 +840,6 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

- head = this_cpu_ptr(call->perf_events);
perf_trace_buf_submit(entry, size, rctx, 0, 1, regs, head, NULL);
out:
preempt_enable();
--
1.5.5.1

2013-04-13 14:19:31

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/1] uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty

On 04/12, Steven Rostedt wrote:
>
> On Thu, 2013-04-11 at 13:59 +0200, Oleg Nesterov wrote:
>
> > > If so, we can do same thing also on trace_events.
> > > (perf_trace_##call in include/trace/ftrace.h)
> >
> > Yes, yes, this is not kprobe-specific. It seems that more users of
> > perf_trace_buf_submit() could be changed the same way.
> >
>
> Oleg,
>
> Can you make the necessary changes elsewhere? I talked with Frederic on
> IRC and he's a bit busy with other work. But he did say he would review
> changes that you make.

Sure, will be happy to do.

But can I change uprobes/perf first? I have a lot more pending changes in
trace_uprobe.c which I am going to ask to pull, I'd like to add this simple
patch to "oleg/misc uprobes/core" if I have your ack.

Oleg.

2013-04-13 18:25:48

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/1] uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty

On 04/13, Oleg Nesterov wrote:
>
> On 04/12, Steven Rostedt wrote:
> >
> > Can you make the necessary changes elsewhere? I talked with Frederic on
> > IRC and he's a bit busy with other work. But he did say he would review
> > changes that you make.
>
> Sure, will be happy to do.

Everything looks trivial except DECLARE_EVENT_CLASS()->perf_trace_*() which
should also check __task != NULL.

In fact this _looks_ simple too, we could move TP_perf_assign() logic into
TP_ARGS(), but probably this is too ugly. I'll try think more.

Oleg.