2008-11-22 11:29:10

by Török Edwin

[permalink] [raw]
Subject: [PATCH 0/2] tracing: userspace stacktraces

Hi,

I am resending my patches, differrences from previous send:
* fixed some style issues
* introduce seq_print_user_ip helper
* make mangle_path global and document
* use mangle_path instead of abusing seq_file in trace_seq_path
* fix subject line and commit messages

These patches apply on top of tip/master,
commit 50598e26c1360d4c0edcabc50035d084ee8d3d70

Török Edwin (2):
tracing: add support for userspace stacktraces in tracing/iter_ctrl
tracing: identify which executable object the userspace address
belongs to

Documentation/ftrace.txt | 16 ++++-
arch/x86/kernel/stacktrace.c | 57 +++++++++++++++
fs/seq_file.c | 14 ++++-
include/linux/seq_file.h | 1 +
include/linux/stacktrace.h | 8 ++
kernel/trace/trace.c | 165 ++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 10 +++
7 files changed, 269 insertions(+), 2 deletions(-)


2008-11-22 11:29:25

by Török Edwin

[permalink] [raw]
Subject: [PATCH 1/2] tracing: add support for userspace stacktraces in tracing/iter_ctrl

Impact: add new (default-off) tracing visualization feature

Usage example:

mount -t debugfs nodev /sys/kernel/debug
cd /sys/kernel/debug/tracing
echo userstacktrace >iter_ctrl
echo sched_switch >current_tracer
echo 1 >tracing_enabled
.... run application ...
echo 0 >tracing_enabled

Then read one of 'trace','latency_trace','trace_pipe'.

To get the best output you can compile your userspace programs with
frame pointers (at least glibc + the app you are tracing).

Signed-off-by: Török Edwin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/ftrace.txt | 5 ++-
arch/x86/kernel/stacktrace.c | 57 +++++++++++++++++++++++++
include/linux/stacktrace.h | 8 ++++
kernel/trace/trace.c | 93 ++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 9 ++++
5 files changed, 171 insertions(+), 1 deletions(-)

diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 753f4de..79a80f7 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -324,7 +324,7 @@ output. To see what is available, simply cat the file:

cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
- noblock nostacktrace nosched-tree
+ noblock nostacktrace nosched-tree nouserstacktrace

To disable one of the options, echo in the option prepended with "no".

@@ -378,6 +378,9 @@ Here are the available options:
When a trace is recorded, so is the stack of functions.
This allows for back traces of trace sites.

+ userstacktrace - This option changes the trace.
+ It records a stacktrace of the current userspace thread.
+
sched-tree - TBD (any users??)


diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index d1d850a..000a965 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -6,6 +6,7 @@
#include <linux/sched.h>
#include <linux/stacktrace.h>
#include <linux/module.h>
+#include <linux/uaccess.h>
#include <asm/stacktrace.h>

static void save_stack_warning(void *data, char *msg)
@@ -90,3 +91,59 @@ void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
trace->entries[trace->nr_entries++] = ULONG_MAX;
}
EXPORT_SYMBOL_GPL(save_stack_trace_tsk);
+
+/* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */
+
+struct stack_frame {
+ const void __user *next_fp;
+ unsigned long return_address;
+};
+
+static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
+{
+ int ret;
+
+ if (!access_ok(VERIFY_READ, fp, sizeof(*frame)))
+ return 0;
+
+ ret = 1;
+ pagefault_disable();
+ if (__copy_from_user_inatomic(frame, fp, sizeof(*frame)))
+ ret = 0;
+ pagefault_enable();
+
+ return ret;
+}
+
+void save_stack_trace_user(struct stack_trace *trace)
+{
+ /*
+ * Trace user stack if we are not a kernel thread
+ */
+ if (current->mm) {
+ const struct pt_regs *regs = task_pt_regs(current);
+ const void __user *fp = (const void __user *)regs->bp;
+
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = regs->ip;
+
+ while (trace->nr_entries < trace->max_entries) {
+ struct stack_frame frame;
+ frame.next_fp = NULL;
+ frame.return_address = 0;
+ if (!copy_stack_frame(fp, &frame))
+ break;
+ if ((unsigned long)fp < regs->sp)
+ break;
+ if (frame.return_address)
+ trace->entries[trace->nr_entries++] =
+ frame.return_address;
+ if (fp == frame.next_fp)
+ break;
+ fp = frame.next_fp;
+ }
+ }
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = ULONG_MAX;
+}
+
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 6b8e54a..46704a5 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -18,9 +18,17 @@ extern void save_stack_trace_tsk(struct task_struct *tsk,
struct stack_trace *trace);

extern void print_stack_trace(struct stack_trace *trace, int spaces);
+
+#ifdef CONFIG_X86
+extern void save_stack_trace_user(struct stack_trace *trace);
+#else
+# define save_stack_trace_user(trace) do { } while (0)
+#endif
+
#else
# define save_stack_trace(trace) do { } while (0)
# define save_stack_trace_tsk(tsk, trace) do { } while (0)
+# define save_stack_trace_user(trace) do { } while (0)
# define print_stack_trace(trace, spaces) do { } while (0)
#endif

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4ee6f03..ced8b4f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -275,6 +275,7 @@ static const char *trace_options[] = {
"ftrace_preempt",
"branch",
"annotate",
+ "userstacktrace",
NULL
};

@@ -918,6 +919,44 @@ void __trace_stack(struct trace_array *tr,
ftrace_trace_stack(tr, data, flags, skip, preempt_count());
}

+static void ftrace_trace_userstack(struct trace_array *tr,
+ struct trace_array_cpu *data,
+ unsigned long flags, int pc)
+{
+ struct userstack_entry *entry;
+ struct stack_trace trace;
+ struct ring_buffer_event *event;
+ unsigned long irq_flags;
+
+ if (!(trace_flags & TRACE_ITER_USERSTACKTRACE))
+ return;
+
+ event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
+ &irq_flags);
+ if (!event)
+ return;
+ entry = ring_buffer_event_data(event);
+ tracing_generic_entry_update(&entry->ent, flags, pc);
+ entry->ent.type = TRACE_USER_STACK;
+
+ memset(&entry->caller, 0, sizeof(entry->caller));
+
+ trace.nr_entries = 0;
+ trace.max_entries = FTRACE_STACK_ENTRIES;
+ trace.skip = 0;
+ trace.entries = entry->caller;
+
+ save_stack_trace_user(&trace);
+ ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
+}
+
+void __trace_userstack(struct trace_array *tr,
+ struct trace_array_cpu *data,
+ unsigned long flags)
+{
+ ftrace_trace_userstack(tr, data, flags, preempt_count());
+}
+
static void
ftrace_trace_special(void *__tr, void *__data,
unsigned long arg1, unsigned long arg2, unsigned long arg3,
@@ -941,6 +980,7 @@ ftrace_trace_special(void *__tr, void *__data,
entry->arg3 = arg3;
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
ftrace_trace_stack(tr, data, irq_flags, 4, pc);
+ ftrace_trace_userstack(tr, data, irq_flags, pc);

trace_wake_up();
}
@@ -979,6 +1019,7 @@ tracing_sched_switch_trace(struct trace_array *tr,
entry->next_cpu = task_cpu(next);
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
ftrace_trace_stack(tr, data, flags, 5, pc);
+ ftrace_trace_userstack(tr, data, flags, pc);
}

void
@@ -1008,6 +1049,7 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
entry->next_cpu = task_cpu(wakee);
ring_buffer_unlock_commit(tr->buffer, event, irq_flags);
ftrace_trace_stack(tr, data, flags, 6, pc);
+ ftrace_trace_userstack(tr, data, flags, pc);

trace_wake_up();
}
@@ -1387,6 +1429,31 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
return ret;
}

+static int
+seq_print_userip_objs(const struct userstack_entry *entry, struct trace_seq *s,
+ unsigned long sym_flags)
+{
+ int ret = 1;
+ unsigned i;
+
+ for (i = 0; i < FTRACE_STACK_ENTRIES; i++) {
+ unsigned long ip = entry->caller[i];
+
+ if (ip == ULONG_MAX || !ret)
+ break;
+ if (i)
+ ret = trace_seq_puts(s, " <- ");
+ if (!ip) {
+ ret = trace_seq_puts(s, "??");
+ continue;
+ }
+ if (ret /*&& (sym_flags & TRACE_ITER_SYM_ADDR)*/)
+ ret = trace_seq_printf(s, " <" IP_FMT ">", ip);
+ }
+
+ return ret;
+}
+
static void print_lat_help_header(struct seq_file *m)
{
seq_puts(m, "# _------=> CPU# \n");
@@ -1702,6 +1769,16 @@ print_lat_fmt(struct trace_iterator *iter, unsigned int trace_idx, int cpu)
field->line);
break;
}
+ case TRACE_USER_STACK: {
+ struct userstack_entry *field;
+
+ trace_assign_type(field, entry);
+
+ seq_print_userip_objs(field, s, sym_flags);
+ if (entry->flags & TRACE_FLAG_CONT)
+ trace_seq_print_cont(s, iter);
+ break;
+ }
default:
trace_seq_printf(s, "Unknown type %d\n", entry->type);
}
@@ -1853,6 +1930,19 @@ static enum print_line_t print_trace_fmt(struct trace_iterator *iter)
field->line);
break;
}
+ case TRACE_USER_STACK: {
+ struct userstack_entry *field;
+
+ trace_assign_type(field, entry);
+
+ ret = seq_print_userip_objs(field, s, sym_flags);
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+ ret = trace_seq_putc(s, '\n');
+ if (!ret)
+ return TRACE_TYPE_PARTIAL_LINE;
+ break;
+ }
}
return TRACE_TYPE_HANDLED;
}
@@ -1912,6 +2002,7 @@ static enum print_line_t print_raw_fmt(struct trace_iterator *iter)
break;
}
case TRACE_SPECIAL:
+ case TRACE_USER_STACK:
case TRACE_STACK: {
struct special_entry *field;

@@ -2000,6 +2091,7 @@ static enum print_line_t print_hex_fmt(struct trace_iterator *iter)
break;
}
case TRACE_SPECIAL:
+ case TRACE_USER_STACK:
case TRACE_STACK: {
struct special_entry *field;

@@ -2054,6 +2146,7 @@ static enum print_line_t print_bin_fmt(struct trace_iterator *iter)
break;
}
case TRACE_SPECIAL:
+ case TRACE_USER_STACK:
case TRACE_STACK: {
struct special_entry *field;

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2cb12fd..17bb4c8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -26,6 +26,7 @@ enum trace_type {
TRACE_BOOT_CALL,
TRACE_BOOT_RET,
TRACE_FN_RET,
+ TRACE_USER_STACK,

__TRACE_LAST_TYPE
};
@@ -42,6 +43,7 @@ struct trace_entry {
unsigned char flags;
unsigned char preempt_count;
int pid;
+ int tgid;
};

/*
@@ -99,6 +101,11 @@ struct stack_entry {
unsigned long caller[FTRACE_STACK_ENTRIES];
};

+struct userstack_entry {
+ struct trace_entry ent;
+ unsigned long caller[FTRACE_STACK_ENTRIES];
+};
+
/*
* ftrace_printk entry:
*/
@@ -240,6 +247,7 @@ extern void __ftrace_bad_type(void);
IF_ASSIGN(var, ent, struct ctx_switch_entry, 0); \
IF_ASSIGN(var, ent, struct trace_field_cont, TRACE_CONT); \
IF_ASSIGN(var, ent, struct stack_entry, TRACE_STACK); \
+ IF_ASSIGN(var, ent, struct userstack_entry, TRACE_USER_STACK);\
IF_ASSIGN(var, ent, struct print_entry, TRACE_PRINT); \
IF_ASSIGN(var, ent, struct special_entry, 0); \
IF_ASSIGN(var, ent, struct trace_mmiotrace_rw, \
@@ -500,6 +508,7 @@ enum trace_iterator_flags {
TRACE_ITER_PREEMPTONLY = 0x800,
TRACE_ITER_BRANCH = 0x1000,
TRACE_ITER_ANNOTATE = 0x2000,
+ TRACE_ITER_USERSTACKTRACE = 0x4000
};

/*
--
1.5.6.5

2008-11-22 11:29:41

by Török Edwin

[permalink] [raw]
Subject: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

Impact: modify+improve the userstacktrace tracing visualization feature

Store thread group leader id, and use it to lookup the address in the
process's map. We could have looked up the address on thread's map,
but the thread might not exist by the time we are called. The process
might not exist either, but if you are reading trace_pipe, that is
unlikely.

Example usage:

mount -t debugfs nodev /sys/kernel/debug
cd /sys/kernel/debug/tracing
echo userstacktrace >iter_ctrl
echo sym-userobj >iter_ctrl
echo sched_switch >current_tracer
echo 1 >tracing_enabled
cat trace_pipe >/tmp/trace&
.... run application ...
echo 0 >tracing_enabled
cat /tmp/trace

You'll see stack entries like:

/lib/libpthread-2.7.so[+0xd370]

You can convert them to function/line using:

addr2line -fie /lib/libpthread-2.7.so 0xd370

Or:

addr2line -fie /usr/lib/debug/libpthread-2.7.so 0xd370

For non-PIC/PIE executables this won't work:

a.out[+0x73b]

You need to run the following: addr2line -fie a.out 0x40073b
(where 0x400000 is the default load address of a.out)

Signed-off-by: Török Edwin <[email protected]>
---
Documentation/ftrace.txt | 13 ++++++-
fs/seq_file.c | 14 +++++++-
include/linux/seq_file.h | 1 +
kernel/trace/trace.c | 86 ++++++++++++++++++++++++++++++++++++++++++----
kernel/trace/trace.h | 3 +-
5 files changed, 107 insertions(+), 10 deletions(-)

diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 79a80f7..35a78bc 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -324,7 +324,7 @@ output. To see what is available, simply cat the file:

cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
- noblock nostacktrace nosched-tree nouserstacktrace
+ noblock nostacktrace nosched-tree nouserstacktrace nosym-userobj

To disable one of the options, echo in the option prepended with "no".

@@ -381,6 +381,17 @@ Here are the available options:
userstacktrace - This option changes the trace.
It records a stacktrace of the current userspace thread.

+ sym-userobj - when user stacktrace are enabled, look up which object the
+ address belongs to, and print a relative address
+ This is especially useful when ASLR is on, otherwise you don't
+ get a chance to resolve the address to object/file/line after the app is no
+ longer running
+
+ The lookup is performed when you read trace,trace_pipe,latency_trace. Example:
+
+ a.out-1623 [000] 40874.465068: /root/a.out[+0x480] <-/root/a.out[+0
+x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
+
sched-tree - TBD (any users??)


diff --git a/fs/seq_file.c b/fs/seq_file.c
index eba2eab..e063077 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -357,7 +357,18 @@ int seq_printf(struct seq_file *m, const char *f, ...)
}
EXPORT_SYMBOL(seq_printf);

-static char *mangle_path(char *s, char *p, char *esc)
+/**
+ * mangle_path - mangle and copy path to buffer beginning
+ * @s - buffer start
+ * @p - beginning of path in above buffer
+ * @esc - set of characters that need escaping
+ *
+ * Copy the path from @p to @s, replacing each occurrence of character from
+ * @esc with usual octal escape.
+ * Returns pointer past last written character in @s, or NULL in case of
+ * failure.
+ */
+char *mangle_path(char *s, char *p, char *esc)
{
while (s <= p) {
char c = *p++;
@@ -376,6 +387,7 @@ static char *mangle_path(char *s, char *p, char *esc)
}
return NULL;
}
+EXPORT_SYMBOL(mangle_path);

/*
* return the absolute path of 'dentry' residing in mount 'mnt'.
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index dc50bcc..b3dfa72 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -34,6 +34,7 @@ struct seq_operations {

#define SEQ_SKIP 1

+char *mangle_path(char *s, char *p, char *esc);
int seq_open(struct file *, const struct seq_operations *);
ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
loff_t seq_lseek(struct file *, loff_t, int);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ced8b4f..62776b7 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -30,6 +30,7 @@
#include <linux/gfp.h>
#include <linux/fs.h>
#include <linux/kprobes.h>
+#include <linux/seq_file.h>
#include <linux/writeback.h>

#include <linux/stacktrace.h>
@@ -276,6 +277,7 @@ static const char *trace_options[] = {
"branch",
"annotate",
"userstacktrace",
+ "sym-userobj",
NULL
};

@@ -422,6 +424,28 @@ trace_seq_putmem_hex(struct trace_seq *s, void *mem, size_t len)
return trace_seq_putmem(s, hex, j);
}

+static int
+trace_seq_path(struct trace_seq *s, struct path *path)
+{
+ unsigned char *p;
+
+ if (s->len >= (PAGE_SIZE - 1))
+ return 0;
+ p = d_path(path, s->buffer + s->len, PAGE_SIZE - s->len);
+ if (!IS_ERR(p)) {
+ p = mangle_path(s->buffer + s->len, p, "\n");
+ if (p) {
+ s->len = p - s->buffer;
+ return 1;
+ }
+ } else {
+ s->buffer[s->len++] = '?';
+ return 1;
+ }
+
+ return 0;
+}
+
static void
trace_seq_reset(struct trace_seq *s)
{
@@ -802,6 +826,7 @@ tracing_generic_entry_update(struct trace_entry *entry, unsigned long flags,

entry->preempt_count = pc & 0xff;
entry->pid = (tsk) ? tsk->pid : 0;
+ entry->tgid = (tsk) ? tsk->tgid : 0;
entry->flags =
#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
@@ -1429,28 +1454,73 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
return ret;
}

+static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
+ unsigned long ip, unsigned long sym_flags)
+{
+ struct file *file = NULL;
+ unsigned long vmstart = 0;
+ int ret = 1;
+
+ if (mm) {
+ const struct vm_area_struct *vma = find_vma(mm, ip);
+ if (vma) {
+ file = vma->vm_file;
+ vmstart = vma->vm_start;
+ }
+ }
+ if (file) {
+ ret = trace_seq_path(s, &file->f_path);
+ if (ret)
+ ret = trace_seq_printf(s, "[+0x%lx]",
+ ip - vmstart);
+ }
+ if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file))
+ ret = trace_seq_printf(s, " <" IP_FMT ">", ip);
+ return ret;
+}
+
static int
seq_print_userip_objs(const struct userstack_entry *entry, struct trace_seq *s,
- unsigned long sym_flags)
+ unsigned long sym_flags)
{
+ struct mm_struct *mm = NULL;
int ret = 1;
unsigned i;

+ if (trace_flags & TRACE_ITER_SYM_USEROBJ) {
+ struct task_struct *task;
+ /*
+ * we do the lookup on the thread group leader,
+ * since individual threads might have already quit!
+ */
+ rcu_read_lock();
+ task = find_task_by_vpid(entry->ent.tgid);
+ rcu_read_unlock();
+
+ if (task)
+ mm = get_task_mm(task);
+ }
+
for (i = 0; i < FTRACE_STACK_ENTRIES; i++) {
unsigned long ip = entry->caller[i];

if (ip == ULONG_MAX || !ret)
break;
- if (i)
+ if (i && ret)
ret = trace_seq_puts(s, " <- ");
if (!ip) {
- ret = trace_seq_puts(s, "??");
+ if (ret)
+ ret = trace_seq_puts(s, "??");
continue;
}
- if (ret /*&& (sym_flags & TRACE_ITER_SYM_ADDR)*/)
- ret = trace_seq_printf(s, " <" IP_FMT ">", ip);
+ if (!ret)
+ break;
+ if (ret)
+ ret = seq_print_user_ip(s, mm, ip, sym_flags);
}

+ if (mm)
+ mmput(mm);
return ret;
}

@@ -1775,8 +1845,7 @@ print_lat_fmt(struct trace_iterator *iter, unsigned int trace_idx, int cpu)
trace_assign_type(field, entry);

seq_print_userip_objs(field, s, sym_flags);
- if (entry->flags & TRACE_FLAG_CONT)
- trace_seq_print_cont(s, iter);
+ trace_seq_putc(s, '\n');
break;
}
default:
@@ -3581,6 +3650,9 @@ void ftrace_dump(void)
atomic_inc(&global_trace.data[cpu]->disabled);
}

+ /* don't look at user memory in panic mode */
+ trace_flags &= ~TRACE_ITER_SYM_USEROBJ;
+
printk(KERN_TRACE "Dumping ftrace buffer:\n");

iter.tr = &global_trace;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 17bb4c8..28c15c2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -508,7 +508,8 @@ enum trace_iterator_flags {
TRACE_ITER_PREEMPTONLY = 0x800,
TRACE_ITER_BRANCH = 0x1000,
TRACE_ITER_ANNOTATE = 0x2000,
- TRACE_ITER_USERSTACKTRACE = 0x4000
+ TRACE_ITER_USERSTACKTRACE = 0x4000,
+ TRACE_ITER_SYM_USEROBJ = 0x8000
};

/*
--
1.5.6.5

2008-11-23 08:26:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] tracing: userspace stacktraces


* T?r?k Edwin <[email protected]> wrote:

> Hi,
>
> I am resending my patches, differrences from previous send:
> * fixed some style issues
> * introduce seq_print_user_ip helper
> * make mangle_path global and document
> * use mangle_path instead of abusing seq_file in trace_seq_path
> * fix subject line and commit messages
>
> These patches apply on top of tip/master,
> commit 50598e26c1360d4c0edcabc50035d084ee8d3d70
>
> T?r?k Edwin (2):
> tracing: add support for userspace stacktraces in tracing/iter_ctrl
> tracing: identify which executable object the userspace address
> belongs to
>
> Documentation/ftrace.txt | 16 ++++-
> arch/x86/kernel/stacktrace.c | 57 +++++++++++++++
> fs/seq_file.c | 14 ++++-
> include/linux/seq_file.h | 1 +
> include/linux/stacktrace.h | 8 ++
> kernel/trace/trace.c | 165 ++++++++++++++++++++++++++++++++++++++++++
> kernel/trace/trace.h | 10 +++
> 7 files changed, 269 insertions(+), 2 deletions(-)

applied to tip/tracing/stack-tracer - thanks Edwin!

I've got a few comments about certain details - please send delta
patches on top of these patches to fix those.

Ingo

2008-11-23 08:38:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/2] tracing: add support for userspace stacktraces in tracing/iter_ctrl


* T?r?k Edwin <[email protected]> wrote:

> +struct stack_frame {
> + const void __user *next_fp;
> + unsigned long return_address;
> +};

Small detail: please s/return_address/ret_addr - its expressive power
is the same but as a bonus it will cause less col-80 related
linebreaks in usage sites.

> +void save_stack_trace_user(struct stack_trace *trace)
> +{
> + /*
> + * Trace user stack if we are not a kernel thread
> + */
> + if (current->mm) {
> + const struct pt_regs *regs = task_pt_regs(current);
> + const void __user *fp = (const void __user *)regs->bp;
> +
> + if (trace->nr_entries < trace->max_entries)
> + trace->entries[trace->nr_entries++] = regs->ip;
> +
> + while (trace->nr_entries < trace->max_entries) {
> + struct stack_frame frame;
> + frame.next_fp = NULL;

Style: put a newline after variable definitions please.

> + frame.return_address = 0;
> + if (!copy_stack_frame(fp, &frame))
> + break;
> + if ((unsigned long)fp < regs->sp)
> + break;
> + if (frame.return_address)
> + trace->entries[trace->nr_entries++] =
> + frame.return_address;

Style: please use curly braces around multi-line conditional
statements.

> + if (fp == frame.next_fp)
> + break;
> + fp = frame.next_fp;
> + }
> + }

Detail: please move the whole "if (current->mm)" into a
__save_stack_trace_user() helper function - that way it reads cleaner.

> +++ b/include/linux/stacktrace.h
> @@ -18,9 +18,17 @@ extern void save_stack_trace_tsk(struct task_struct *tsk,
> struct stack_trace *trace);
>
> extern void print_stack_trace(struct stack_trace *trace, int spaces);
> +
> +#ifdef CONFIG_X86
> +extern void save_stack_trace_user(struct stack_trace *trace);
> +#else
> +# define save_stack_trace_user(trace) do { } while (0)
> +#endif

Bug: this should not be CONFIG_X86. Please introduce (in a separate
patch from other cleanups) a CONFIG_USER_STACKTRACE_SUPPORT in
arch/x86/Kconfig and use that instead.

> +
> #else
> # define save_stack_trace(trace) do { } while (0)
> # define save_stack_trace_tsk(tsk, trace) do { } while (0)
> +# define save_stack_trace_user(trace) do { } while (0)

Style: these should be tabs, not spaces ^^^^^^^^^^^^

> +static void ftrace_trace_userstack(struct trace_array *tr,
> + struct trace_array_cpu *data,
> + unsigned long flags, int pc)
> +{
> + struct userstack_entry *entry;
> + struct stack_trace trace;
> + struct ring_buffer_event *event;
> + unsigned long irq_flags;

Detail: please use the customary ftrace variable definitions style:

> + struct ring_buffer_event *event;
> + struct userstack_entry *entry;
> + struct stack_trace trace;
> + unsigned long irq_flags;

note how it's ordered by line length.

> +static int
> +seq_print_userip_objs(const struct userstack_entry *entry, struct trace_seq *s,
> + unsigned long sym_flags)
> +{
> + int ret = 1;
> + unsigned i;

Detail: please use "unsigned int" so that we have less type patterns
to look for.

> +
> + for (i = 0; i < FTRACE_STACK_ENTRIES; i++) {
> + unsigned long ip = entry->caller[i];
> +
> + if (ip == ULONG_MAX || !ret)
> + break;
> + if (i)
> + ret = trace_seq_puts(s, " <- ");
> + if (!ip) {
> + ret = trace_seq_puts(s, "??");
> + continue;
> + }
> + if (ret /*&& (sym_flags & TRACE_ITER_SYM_ADDR)*/)
> + ret = trace_seq_printf(s, " <" IP_FMT ">", ip);

Detail: do we need that commented-out condition?

Ingo

2008-11-23 08:48:12

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] vfs, seqfile: make mangle_path() global


* T?r?k Edwin <[email protected]> wrote:

> fs/seq_file.c | 14 +++++++-
> include/linux/seq_file.h | 1 +

Note, i've split these bits out into a separate patch - see it
attached below.

Ingo

------------------>
>From 74e2f334f4440cbcb63e9ebbcdcea430d41bdfa3 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?T=C3=B6r=C3=B6k=20Edwin?= <[email protected]>
Date: Sat, 22 Nov 2008 13:28:48 +0200
Subject: [PATCH] vfs, seqfile: make mangle_path() global
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

Impact: expose new VFS API

make mangle_path() available, as per the suggestions of Christoph Hellwig
and Al Viro:

http://lkml.org/lkml/2008/11/4/338

Signed-off-by: T?r?k Edwin <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
fs/seq_file.c | 14 +++++++++++++-
include/linux/seq_file.h | 1 +
2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index eba2eab..f5b61cc 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -357,7 +357,18 @@ int seq_printf(struct seq_file *m, const char *f, ...)
}
EXPORT_SYMBOL(seq_printf);

-static char *mangle_path(char *s, char *p, char *esc)
+/**
+ * mangle_path - mangle and copy path to buffer beginning
+ * @s - buffer start
+ * @p - beginning of path in above buffer
+ * @esc - set of characters that need escaping
+ *
+ * Copy the path from @p to @s, replacing each occurrence of character from
+ * @esc with usual octal escape.
+ * Returns pointer past last written character in @s, or NULL in case of
+ * failure.
+ */
+char *mangle_path(char *s, char *p, char *esc)
{
while (s <= p) {
char c = *p++;
@@ -376,6 +387,7 @@ static char *mangle_path(char *s, char *p, char *esc)
}
return NULL;
}
+EXPORT_SYMBOL_GPL(mangle_path);

/*
* return the absolute path of 'dentry' residing in mount 'mnt'.
diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
index dc50bcc..b3dfa72 100644
--- a/include/linux/seq_file.h
+++ b/include/linux/seq_file.h
@@ -34,6 +34,7 @@ struct seq_operations {

#define SEQ_SKIP 1

+char *mangle_path(char *s, char *p, char *esc);
int seq_open(struct file *, const struct seq_operations *);
ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
loff_t seq_lseek(struct file *, loff_t, int);

2008-11-23 08:53:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to


* T?r?k Edwin <[email protected]> wrote:

> Impact: modify+improve the userstacktrace tracing visualization feature

> +static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> + unsigned long ip, unsigned long sym_flags)
> +{
> + struct file *file = NULL;
> + unsigned long vmstart = 0;
> + int ret = 1;
> +
> + if (mm) {
> + const struct vm_area_struct *vma = find_vma(mm, ip);
> + if (vma) {

Style: please put a newline after variable definition blocks.

> + file = vma->vm_file;
> + vmstart = vma->vm_start;

Bug: it's generally unsafe to look up a vma and use it without having
done a down_read(&mm->mmap_sem). Another thread (of this ->mm) could
go and modify it in parallel.

> + }
> + }
> + if (file) {
> + ret = trace_seq_path(s, &file->f_path);
> + if (ret)
> + ret = trace_seq_printf(s, "[+0x%lx]",
> + ip - vmstart);

Style: that linebreak is unnecessary.

> + if (trace_flags & TRACE_ITER_SYM_USEROBJ) {
> + struct task_struct *task;
> + /*
> + * we do the lookup on the thread group leader,
> + * since individual threads might have already quit!
> + */
> + rcu_read_lock();
> + task = find_task_by_vpid(entry->ent.tgid);
> + rcu_read_unlock();
> +
> + if (task)
> + mm = get_task_mm(task);

Bug: it is unsafe to look up a task and then drop the RCU lock and use
it - the task could go away the moment the RCU read-lock is dropped.

A safer sequence would be to get get_task_mm(task) reference inside
the RCU critical section.

Ingo

2008-11-23 09:24:43

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 0/2] tracing: userspace stacktraces

On 2008-11-23 10:26, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>
>> Hi,
>>
>> I am resending my patches, differrences from previous send:
>> * fixed some style issues
>> * introduce seq_print_user_ip helper
>> * make mangle_path global and document
>> * use mangle_path instead of abusing seq_file in trace_seq_path
>> * fix subject line and commit messages
>>
>> These patches apply on top of tip/master,
>> commit 50598e26c1360d4c0edcabc50035d084ee8d3d70
>>
>> T?r?k Edwin (2):
>> tracing: add support for userspace stacktraces in tracing/iter_ctrl
>> tracing: identify which executable object the userspace address
>> belongs to
>>
>> Documentation/ftrace.txt | 16 ++++-
>> arch/x86/kernel/stacktrace.c | 57 +++++++++++++++
>> fs/seq_file.c | 14 ++++-
>> include/linux/seq_file.h | 1 +
>> include/linux/stacktrace.h | 8 ++
>> kernel/trace/trace.c | 165 ++++++++++++++++++++++++++++++++++++++++++
>> kernel/trace/trace.h | 10 +++
>> 7 files changed, 269 insertions(+), 2 deletions(-)
>>
>
> applied to tip/tracing/stack-tracer - thanks Edwin!
>
> I've got a few comments about certain details - please send delta
> patches on top of these patches to fix those.

Thanks for the review Ingo,

The last commit I see is f8d56f1771e4867acc461146764b4feeb5245669, Sun
Oct 26 16:42:18 2008 -0700, am I looking at the wrong place?

http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=shortlog;h=tracing/stack-tracer

Best regards,
--Edwin

2008-11-23 09:30:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] tracing: userspace stacktraces


* T?r?k Edwin <[email protected]> wrote:

> On 2008-11-23 10:26, Ingo Molnar wrote:
> > * T?r?k Edwin <[email protected]> wrote:
> >
> >
> >> Hi,
> >>
> >> I am resending my patches, differrences from previous send:
> >> * fixed some style issues
> >> * introduce seq_print_user_ip helper
> >> * make mangle_path global and document
> >> * use mangle_path instead of abusing seq_file in trace_seq_path
> >> * fix subject line and commit messages
> >>
> >> These patches apply on top of tip/master,
> >> commit 50598e26c1360d4c0edcabc50035d084ee8d3d70
> >>
> >> T?r?k Edwin (2):
> >> tracing: add support for userspace stacktraces in tracing/iter_ctrl
> >> tracing: identify which executable object the userspace address
> >> belongs to
> >>
> >> Documentation/ftrace.txt | 16 ++++-
> >> arch/x86/kernel/stacktrace.c | 57 +++++++++++++++
> >> fs/seq_file.c | 14 ++++-
> >> include/linux/seq_file.h | 1 +
> >> include/linux/stacktrace.h | 8 ++
> >> kernel/trace/trace.c | 165 ++++++++++++++++++++++++++++++++++++++++++
> >> kernel/trace/trace.h | 10 +++
> >> 7 files changed, 269 insertions(+), 2 deletions(-)
> >>
> >
> > applied to tip/tracing/stack-tracer - thanks Edwin!
> >
> > I've got a few comments about certain details - please send delta
> > patches on top of these patches to fix those.
>
> Thanks for the review Ingo,
>
> The last commit I see is f8d56f1771e4867acc461146764b4feeb5245669, Sun
> Oct 26 16:42:18 2008 -0700, am I looking at the wrong place?
>
> http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=shortlog;h=tracing/stack-tracer

ok - i've pushed out a new tip/tracing/stack-tracer branch.

(have not pushed out the tip/master integration yet - i'm still
testing things)

Ingo

2008-11-23 10:39:25

by Török Edwin

[permalink] [raw]
Subject: [PATCH 1/3] tracing/stack-tracer: fix style issues


Signed-off-by: Török Edwin <[email protected]>
---
arch/x86/kernel/stacktrace.c | 51 +++++++++++++++++++++++------------------
include/linux/stacktrace.h | 2 +-
kernel/trace/trace.c | 7 ++---
3 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index b151530..10786af 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -89,7 +89,7 @@ EXPORT_SYMBOL_GPL(save_stack_trace_tsk);

struct stack_frame {
const void __user *next_fp;
- unsigned long return_address;
+ unsigned long ret_addr;
};

static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
@@ -108,33 +108,40 @@ static int copy_stack_frame(const void __user *fp, struct stack_frame *frame)
return ret;
}

+static inline void __save_stack_trace_user(struct stack_trace *trace)
+{
+ const struct pt_regs *regs = task_pt_regs(current);
+ const void __user *fp = (const void __user *)regs->bp;
+
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = regs->ip;
+
+ while (trace->nr_entries < trace->max_entries) {
+ struct stack_frame frame;
+
+ frame.next_fp = NULL;
+ frame.ret_addr = 0;
+ if (!copy_stack_frame(fp, &frame))
+ break;
+ if ((unsigned long)fp < regs->sp)
+ break;
+ if (frame.ret_addr) {
+ trace->entries[trace->nr_entries++] =
+ frame.ret_addr;
+ }
+ if (fp == frame.next_fp)
+ break;
+ fp = frame.next_fp;
+ }
+}
+
void save_stack_trace_user(struct stack_trace *trace)
{
/*
* Trace user stack if we are not a kernel thread
*/
if (current->mm) {
- const struct pt_regs *regs = task_pt_regs(current);
- const void __user *fp = (const void __user *)regs->bp;
-
- if (trace->nr_entries < trace->max_entries)
- trace->entries[trace->nr_entries++] = regs->ip;
-
- while (trace->nr_entries < trace->max_entries) {
- struct stack_frame frame;
- frame.next_fp = NULL;
- frame.return_address = 0;
- if (!copy_stack_frame(fp, &frame))
- break;
- if ((unsigned long)fp < regs->sp)
- break;
- if (frame.return_address)
- trace->entries[trace->nr_entries++] =
- frame.return_address;
- if (fp == frame.next_fp)
- break;
- fp = frame.next_fp;
- }
+ __save_stack_trace_user(trace);
}
if (trace->nr_entries < trace->max_entries)
trace->entries[trace->nr_entries++] = ULONG_MAX;
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index 68de514..fd42d68 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -25,7 +25,7 @@ extern void save_stack_trace_user(struct stack_trace *trace);
#else
# define save_stack_trace(trace) do { } while (0)
# define save_stack_trace_tsk(tsk, trace) do { } while (0)
-# define save_stack_trace_user(trace) do { } while (0)
+# define save_stack_trace_user(trace) do { } while (0)
# define print_stack_trace(trace, spaces) do { } while (0)
#endif

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 62776b7..dedf35f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -948,9 +948,9 @@ static void ftrace_trace_userstack(struct trace_array *tr,
struct trace_array_cpu *data,
unsigned long flags, int pc)
{
+ struct ring_buffer_event *event;
struct userstack_entry *entry;
struct stack_trace trace;
- struct ring_buffer_event *event;
unsigned long irq_flags;

if (!(trace_flags & TRACE_ITER_USERSTACKTRACE))
@@ -1471,8 +1471,7 @@ static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
if (file) {
ret = trace_seq_path(s, &file->f_path);
if (ret)
- ret = trace_seq_printf(s, "[+0x%lx]",
- ip - vmstart);
+ ret = trace_seq_printf(s, "[+0x%lx]", ip - vmstart);
}
if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file))
ret = trace_seq_printf(s, " <" IP_FMT ">", ip);
@@ -1485,7 +1484,7 @@ seq_print_userip_objs(const struct userstack_entry *entry, struct trace_seq *s,
{
struct mm_struct *mm = NULL;
int ret = 1;
- unsigned i;
+ unsigned int i;

if (trace_flags & TRACE_ITER_SYM_USEROBJ) {
struct task_struct *task;
--
1.5.6.5

2008-11-23 10:39:41

by Török Edwin

[permalink] [raw]
Subject: [PATCH 3/3] tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT

User stack tracing is just implemented for x86, but it is not x86 specific.
Introduce a generic config flag, that is currently enabled only for x86.
When other arches implements it, they will have to SELECT USER_STACKTRACE_SUPPORT

Signed-off-by: Török Edwin <[email protected]>
---
arch/x86/Kconfig | 1 +
include/linux/stacktrace.h | 2 +-
kernel/trace/Kconfig | 3 +++
3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7a146ba..e49a4fd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -36,6 +36,7 @@ config X86
select HAVE_ARCH_TRACEHOOK
select HAVE_GENERIC_DMA_COHERENT if X86_32
select HAVE_EFFICIENT_UNALIGNED_ACCESS
+ select USER_STACKTRACE_SUPPORT

config ARCH_DEFCONFIG
string
diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h
index fd42d68..1a8cecc 100644
--- a/include/linux/stacktrace.h
+++ b/include/linux/stacktrace.h
@@ -16,7 +16,7 @@ extern void save_stack_trace_tsk(struct task_struct *tsk,

extern void print_stack_trace(struct stack_trace *trace, int spaces);

-#ifdef CONFIG_X86
+#ifdef CONFIG_USER_STACKTRACE_SUPPORT
extern void save_stack_trace_user(struct stack_trace *trace);
#else
# define save_stack_trace_user(trace) do { } while (0)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b8378fa..87fc34a 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -3,6 +3,9 @@
# select HAVE_FUNCTION_TRACER:
#

+config USER_STACKTRACE_SUPPORT
+ bool
+
config NOP_TRACER
bool

--
1.5.6.5

2008-11-23 10:39:56

by Török Edwin

[permalink] [raw]
Subject: [PATCH 2/3] tracing/stack-tracer: fix locking

Hold mmap_sem while looking up/accessing vma.
Hold the RCU lock while using the task we looked up.

Signed-off-by: Török Edwin <[email protected]>
---
kernel/trace/trace.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dedf35f..4c3bd82 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1462,11 +1462,15 @@ static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
int ret = 1;

if (mm) {
- const struct vm_area_struct *vma = find_vma(mm, ip);
+ const struct vm_area_struct *vma;
+
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, ip);
if (vma) {
file = vma->vm_file;
vmstart = vma->vm_start;
}
+ up_read(&mm->mmap_sem);
}
if (file) {
ret = trace_seq_path(s, &file->f_path);
@@ -1494,10 +1498,9 @@ seq_print_userip_objs(const struct userstack_entry *entry, struct trace_seq *s,
*/
rcu_read_lock();
task = find_task_by_vpid(entry->ent.tgid);
- rcu_read_unlock();
-
if (task)
mm = get_task_mm(task);
+ rcu_read_unlock();
}

for (i = 0; i < FTRACE_STACK_ENTRIES; i++) {
--
1.5.6.5

2008-11-23 10:52:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing/stack-tracer: fix locking


* T?r?k Edwin <[email protected]> wrote:

> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, ip);
> if (vma) {
> file = vma->vm_file;
> vmstart = vma->vm_start;
> }
> + up_read(&mm->mmap_sem);
> }
> if (file) {
> ret = trace_seq_path(s, &file->f_path);

and now it's "file" that is held without a reference and possibly
racy.

Ingo

2008-11-23 10:59:20

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing/stack-tracer: fix locking

On 2008-11-23 12:52, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>
>> + down_read(&mm->mmap_sem);
>> + vma = find_vma(mm, ip);
>> if (vma) {
>> file = vma->vm_file;
>> vmstart = vma->vm_start;
>> }
>> + up_read(&mm->mmap_sem);
>> }
>> if (file) {
>> ret = trace_seq_path(s, &file->f_path);
>>
>
> and now it's "file" that is held without a reference and possibly
> racy.
>

I'll move the (file) inside the mmap_sem too, since file is part of vma,
and you need mmap_sem to access the vma
nobody can modify (file) if they don't hold mmap_sem.
Am I right?

--Edwin

2008-11-23 11:01:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing/stack-tracer: fix locking


* T?r?k Edwin <[email protected]> wrote:

> On 2008-11-23 12:52, Ingo Molnar wrote:
> > * T?r?k Edwin <[email protected]> wrote:
> >
> >
> >> + down_read(&mm->mmap_sem);
> >> + vma = find_vma(mm, ip);
> >> if (vma) {
> >> file = vma->vm_file;
> >> vmstart = vma->vm_start;
> >> }
> >> + up_read(&mm->mmap_sem);
> >> }
> >> if (file) {
> >> ret = trace_seq_path(s, &file->f_path);
> >>
> >
> > and now it's "file" that is held without a reference and possibly
> > racy.
> >
>
> I'll move the (file) inside the mmap_sem too, since file is part of
> vma, and you need mmap_sem to access the vma nobody can modify
> (file) if they don't hold mmap_sem. Am I right?

yeah - you can use the mm lock for that as the vma holds a reference
to the file. This works as long as none of the seq functions called
with this lock held can provoke a user pagefault (they shouldnt, but i
havent checked in detail).

Ingo

2008-11-23 11:04:20

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing/stack-tracer: fix locking

On 2008-11-23 13:01, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>
>> On 2008-11-23 12:52, Ingo Molnar wrote:
>>
>>> * T?r?k Edwin <[email protected]> wrote:
>>>
>>>
>>>
>>>> + down_read(&mm->mmap_sem);
>>>> + vma = find_vma(mm, ip);
>>>> if (vma) {
>>>> file = vma->vm_file;
>>>> vmstart = vma->vm_start;
>>>> }
>>>> + up_read(&mm->mmap_sem);
>>>> }
>>>> if (file) {
>>>> ret = trace_seq_path(s, &file->f_path);
>>>>
>>>>
>>> and now it's "file" that is held without a reference and possibly
>>> racy.
>>>
>>>
>> I'll move the (file) inside the mmap_sem too, since file is part of
>> vma, and you need mmap_sem to access the vma nobody can modify
>> (file) if they don't hold mmap_sem. Am I right?
>>
>
> yeah - you can use the mm lock for that as the vma holds a reference
> to the file. This works as long as none of the seq functions called
> with this lock held can provoke a user pagefault (they shouldnt, but i
> havent checked in detail).
>

It calls d_path, that access only kernel structures AFAICT, so it can't
provoke a *user* pagefault.
Not sure about kernel pagefaults, but they shouldn't matter.

Best regards,
--Edwin

2008-11-23 11:07:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing/stack-tracer: fix locking


* T?r?k Edwin <[email protected]> wrote:

> On 2008-11-23 13:01, Ingo Molnar wrote:
> > * T?r?k Edwin <[email protected]> wrote:
> >
> >
> >> On 2008-11-23 12:52, Ingo Molnar wrote:
> >>
> >>> * T?r?k Edwin <[email protected]> wrote:
> >>>
> >>>
> >>>
> >>>> + down_read(&mm->mmap_sem);
> >>>> + vma = find_vma(mm, ip);
> >>>> if (vma) {
> >>>> file = vma->vm_file;
> >>>> vmstart = vma->vm_start;
> >>>> }
> >>>> + up_read(&mm->mmap_sem);
> >>>> }
> >>>> if (file) {
> >>>> ret = trace_seq_path(s, &file->f_path);
> >>>>
> >>>>
> >>> and now it's "file" that is held without a reference and possibly
> >>> racy.
> >>>
> >>>
> >> I'll move the (file) inside the mmap_sem too, since file is part of
> >> vma, and you need mmap_sem to access the vma nobody can modify
> >> (file) if they don't hold mmap_sem. Am I right?
> >>
> >
> > yeah - you can use the mm lock for that as the vma holds a reference
> > to the file. This works as long as none of the seq functions called
> > with this lock held can provoke a user pagefault (they shouldnt, but i
> > havent checked in detail).
> >
>
> It calls d_path, that access only kernel structures AFAICT, so it
> can't provoke a *user* pagefault. Not sure about kernel pagefaults,
> but they shouldn't matter.

yeah - kernel pagefaults can happen in rare circumstances here but
they are all atomic and transparent to most other parts of the kernel
so they indeed do not matter.

Ingo

2008-11-23 11:08:26

by Török Edwin

[permalink] [raw]
Subject: [PATCH] tracing/stack-tracer: avoid races accessing file

---
kernel/trace/trace.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4c3bd82..48d1536 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1470,13 +1470,13 @@ static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
file = vma->vm_file;
vmstart = vma->vm_start;
}
+ if (file) {
+ ret = trace_seq_path(s, &file->f_path);
+ if (ret)
+ ret = trace_seq_printf(s, "[+0x%lx]", ip - vmstart);
+ }
up_read(&mm->mmap_sem);
}
- if (file) {
- ret = trace_seq_path(s, &file->f_path);
- if (ret)
- ret = trace_seq_printf(s, "[+0x%lx]", ip - vmstart);
- }
if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file))
ret = trace_seq_printf(s, " <" IP_FMT ">", ip);
return ret;
--
1.5.6.5

2008-11-23 11:21:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] tracing/stack-tracer: avoid races accessing file


* T?r?k Edwin <[email protected]> wrote:

> ---
> kernel/trace/trace.c | 10 +++++-----
> 1 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4c3bd82..48d1536 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -1470,13 +1470,13 @@ static inline int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
> file = vma->vm_file;
> vmstart = vma->vm_start;
> }
> + if (file) {
> + ret = trace_seq_path(s, &file->f_path);
> + if (ret)
> + ret = trace_seq_printf(s, "[+0x%lx]", ip - vmstart);
> + }
> up_read(&mm->mmap_sem);
> }
> - if (file) {
> - ret = trace_seq_path(s, &file->f_path);
> - if (ret)
> - ret = trace_seq_printf(s, "[+0x%lx]", ip - vmstart);
> - }

applied to tip/tracing/stack-tracer, thanks Edwin!

Ingo

2008-11-23 21:07:16

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] vfs, seqfile: make mangle_path() global

Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>> fs/seq_file.c | 14 +++++++-
>> include/linux/seq_file.h | 1 +
>
> Note, i've split these bits out into a separate patch - see it
> attached below.
>
> Ingo
>
> ------------------>
> From 74e2f334f4440cbcb63e9ebbcdcea430d41bdfa3 Mon Sep 17 00:00:00 2001
> From: =?utf-8?q?T=C3=B6r=C3=B6k=20Edwin?= <[email protected]>
> Date: Sat, 22 Nov 2008 13:28:48 +0200
> Subject: [PATCH] vfs, seqfile: make mangle_path() global
> MIME-Version: 1.0
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 8bit
>
> Impact: expose new VFS API
>
> make mangle_path() available, as per the suggestions of Christoph Hellwig
> and Al Viro:
>
> http://lkml.org/lkml/2008/11/4/338
>
> Signed-off-by: T?r?k Edwin <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> fs/seq_file.c | 14 +++++++++++++-
> include/linux/seq_file.h | 1 +
> 2 files changed, 14 insertions(+), 1 deletions(-)
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index eba2eab..f5b61cc 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -357,7 +357,18 @@ int seq_printf(struct seq_file *m, const char *f, ...)
> }
> EXPORT_SYMBOL(seq_printf);
>
> -static char *mangle_path(char *s, char *p, char *esc)
> +/**
> + * mangle_path - mangle and copy path to buffer beginning
> + * @s - buffer start
> + * @p - beginning of path in above buffer
> + * @esc - set of characters that need escaping

Format for function parameters is:

* @esc: set of characters that need escaping

please. (i.e., use : instead of -)

> + *
> + * Copy the path from @p to @s, replacing each occurrence of character from
> + * @esc with usual octal escape.
> + * Returns pointer past last written character in @s, or NULL in case of
> + * failure.
> + */
> +char *mangle_path(char *s, char *p, char *esc)
> {
> while (s <= p) {
> char c = *p++;
> @@ -376,6 +387,7 @@ static char *mangle_path(char *s, char *p, char *esc)
> }
> return NULL;
> }
> +EXPORT_SYMBOL_GPL(mangle_path);
>
> /*
> * return the absolute path of 'dentry' residing in mount 'mnt'.
> diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
> index dc50bcc..b3dfa72 100644
> --- a/include/linux/seq_file.h
> +++ b/include/linux/seq_file.h
> @@ -34,6 +34,7 @@ struct seq_operations {
>
> #define SEQ_SKIP 1
>
> +char *mangle_path(char *s, char *p, char *esc);
> int seq_open(struct file *, const struct seq_operations *);
> ssize_t seq_read(struct file *, char __user *, size_t, loff_t *);
> loff_t seq_lseek(struct file *, loff_t, int);


--
~Randy

2008-11-23 21:25:16

by Török Edwin

[permalink] [raw]
Subject: [PATCH] fix comment style on mangle_path


Signed-off-by: Török Edwin <[email protected]>
---
fs/seq_file.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index f5b61cc..f03220d 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -358,10 +358,10 @@ int seq_printf(struct seq_file *m, const char *f, ...)
EXPORT_SYMBOL(seq_printf);

/**
- * mangle_path - mangle and copy path to buffer beginning
- * @s - buffer start
- * @p - beginning of path in above buffer
- * @esc - set of characters that need escaping
+ * mangle_path - mangle and copy path to buffer beginning
+ * @s: buffer start
+ * @p: beginning of path in above buffer
+ * @esc: set of characters that need escaping
*
* Copy the path from @p to @s, replacing each occurrence of character from
* @esc with usual octal escape.
--
1.5.6.5

2008-11-23 21:37:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] fix comment style on mangle_path


* T?r?k Edwin <[email protected]> wrote:

> Signed-off-by: T?r?k Edwin <[email protected]>
> ---
> fs/seq_file.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index f5b61cc..f03220d 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -358,10 +358,10 @@ int seq_printf(struct seq_file *m, const char *f, ...)
> EXPORT_SYMBOL(seq_printf);
>
> /**
> - * mangle_path - mangle and copy path to buffer beginning
> - * @s - buffer start
> - * @p - beginning of path in above buffer
> - * @esc - set of characters that need escaping
> + * mangle_path - mangle and copy path to buffer beginning
> + * @s: buffer start
> + * @p: beginning of path in above buffer
> + * @esc: set of characters that need escaping

thanks - applied this too.

Ingo

2008-11-25 14:41:55

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

=?utf-8?q?T=C3=B6r=C3=B6k=20Edwin?= <[email protected]> writes:

> Impact: modify+improve the userstacktrace tracing visualization feature
> [...]
> You'll see stack entries like:
> /lib/libpthread-2.7.so[+0xd370]
> [...]

Can you suggest an actual distribution & architecture where this
facility may be tested/used? It appears to require frame-pointer
stuff that AFAIK is not generally turned on for user-space.

- FChE

2008-11-26 09:59:25

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

On 2008-11-25 16:40, Frank Ch. Eigler wrote:
> =?utf-8?q?T=C3=B6r=C3=B6k=20Edwin?= <[email protected]> writes:
>
>
>> Impact: modify+improve the userstacktrace tracing visualization feature
>> [...]
>> You'll see stack entries like:
>> /lib/libpthread-2.7.so[+0xd370]
>> [...]
>>
>
> Can you suggest an actual distribution & architecture where this
> facility may be tested/used?

Debian unstable, x86-32 (I tested on Intel Core Duo).
Out of the box, no rebuild of glibc needed.

Here is a sample output:

# echo sched_switch >current_tracer
# echo userstacktrace >trace_options
# echo sym-userobj >trace_options
# echo 1>tracing_enabled
# cat trace_pipe >/tmp/test&
$ ~/a.out
# echo 0 >tracing_enabled

(You can also use other tracers besides sched_switch.)

bash-6554 [000] 377.837014: 6554:120:S + [001] 6216:120:S
bash-6554 [000] 377.837015: <b7f40424> <-
/lib/i686/cmov/libc-2.7.so[+0x6af77] <-
/lib/i686/cmov/libc-2.7.so[+0x6a81a] <-
/lib/i686/cmov/libc-2.7.so[+0x5e7e9] <- /bin/bash[+0x5f955] <-
/bin/bash[+0x2822a] <- /bin/bash[+0x2bf21] <- /bin/bash[+0x2a403]
....
a.out-7442 [000] 386.221719: 7442:120:R ==> [000] 1344:115:R
a.out-7442 [000] 386.221720: /lib/ld-2.7.so[+0xbed09631]
<- /lib/ld-2.7.so[+0xbecd4f28] <- /lib/ld-2.7.so[+0xbece9c36] <-
/lib/ld-2.7.so[+0xbeca5603] <- /lib/ld-2.7.so[+0xbeca3403] <-
/lib/ld-2.7.so[+0xbeca1983] <- /lib/ld-2.7.so[+0xbec93fab] <-
/lib/ld-2.7.so[+0xbec93aa5]
a.out-7442 [000] 386.224544: 7442:120:R + [000] 1:120:S
a.out-7442 [000] 386.224545: /home/edwin/a.out[+0x377] <-
/home/edwin/a.out[+0x38f] <- /lib/i686/cmov/libc-2.7.so[+0x16455] <-
/home/edwin/a.out[+0x2e1]
a.out-7442 [000] 386.224562: 7442:120:R ==> [000]
1:120:R
...
a.out-7442 [000] 386.224562: /home/edwin/a.out[+0x377] <-
/home/edwin/a.out[+0x38f] <- /lib/i686/cmov/libc-2.7.so[+0x16455] <-
/home/edwin/a.out[+0x2e1]
Xorg-5895 [001] 388.180034: 5895:120:R + [001] 6214:120:S
Xorg-5895 [001] 388.180035: <b805f424> <-
/usr/bin/Xorg[+0x17addf] <- /usr/bin/Xorg[+0x1768cb] <-
/usr/bin/Xorg[+0x46e10] <- /usr/bin/Xorg[+0x2d795] <-
/lib/i686/cmov/libc-2.7.so[+0x16455] <- /usr/bin/Xorg[+0x2ca81]

If I run the little (and inefficient) perl script below I get:
bash-6554 [000] 377.837015
<b7f40424>/lib/i686/cmov/libc-2.7.so[+0x6af77] <-
/lib/i686/cmov/libc-2.7.so[+0x6a81a] <-
/lib/i686/cmov/libc-2.7.so[+0x5e7e9] <- /bin/bash[+0x5f955] <-
/bin/bash[+0x2822a] <- /bin/bash[+0x2bf21] <- /bin/bash[+0x2a403]
<- a.out-7442 [000] 386.221720
/lib/ld-2.7.so[+0xbed09631] <- /lib/ld-2.7.so[+0xbecd4f28] <-
/lib/ld-2.7.so[+0xbece9c36] <- /lib/ld-2.7.so[+0xbeca5603] <-
/lib/ld-2.7.so[+0xbeca3403] <- /lib/ld-2.7.so[+0xbeca1983] <-
/lib/ld-2.7.so[+0xbec93fab] <- /lib/ld-2.7.so[+0xbec93aa5]
<- a.out-7442 [000] 386.224545/home/edwin/ll.c:2 <-
/home/edwin/ll.c:9 <- /lib/i686/cmov/libc-2.7.so[+0x16455] <-
/home/edwin/a.out[+0x2e1]
<- a.out-7442 [000] 386.224562/home/edwin/ll.c:2 <-
/home/edwin/ll.c:9 <- /lib/i686/cmov/libc-2.7.so[+0x16455] <-
/home/edwin/a.out[+0x2e1]
<- Xorg-5895 [001] 388.180035
<b805f424>/usr/bin/Xorg[+0x17addf] <- /usr/bin/Xorg[+0x1768cb] <-
/usr/bin/Xorg[+0x46e10] <- /usr/bin/Xorg[+0x2d795] <-
/lib/i686/cmov/libc-2.7.so[+0x16455] <- /usr/bin/Xorg[+0x2ca81]

If I use a glibc with debug symbols (by installing the distro package
libc6-dbg, and using LD_LIBRARY_PATH=/usr/lib/debug), I can even get
libc line numbers:

a.out-18077 [001] 4154503532.435531: 18077:120:S ==>
[001] 0:140:R
a.out-18077 [001] 4154503532.435532:
/usr/lib/debug/libc-2.7.so[+0x35424] <- /lib/ld-2.7.so[+0xbeca5186] <-
/lib/ld-2.7.so[+0xbeca3403] <- /lib/ld-2.7.so[+0xbeca1983] <-
/lib/ld-2.7.so[+0xbec93fab] <- /lib/ld-2.7.so[+0xbec93aa5] <-
<49406455> <- /lib/ld-2.7.so[+0xbec91521]
a.out-18077 [001] 4154503532.481759: 18077:120:R + [001]
6214:120:S
a.out-18077 [001] 4154503532.481760:
/usr/lib/debug/libpthread-2.7.so[+0x72f5] <- /home/edwin/a.out[+0x456]
<- /home/edwin/a.out[+0x47a] <- /usr/lib/debug/libc-2.7.so[+0x16455] <-
/home/edwin/a.out[+0x3b1]

a.out-18077 [001]
4154503532.435532/build/buildd/glibc-2.7/build-tree/glibc-2.7/stdlib/../stdlib/strtod_l.c:375
<- /lib/ld-2.7.so[+0xbeca5186] <- /lib/ld-2.7.so[+0xbeca3403] <-
/lib/ld-2.7.so[+0xbeca1983] <- /lib/ld-2.7.so[+0xbec93fab] <-
/lib/ld-2.7.so[+0xbec93aa5] <- <49406455>/lib/ld-2.7.so[+0xbec91521]
<- a.out-18077 [001]
4154503532.481760/build/buildd/glibc-2.7/build-tree/glibc-2.7/nptl/pthread_mutex_lock.c:87
<- /home/edwin/ll.c:7 <- /home/edwin/ll.c:14 <-
/build/buildd/glibc-2.7/build-tree/glibc-2.7/csu/libc-start.c:254 <-
/home/edwin/a.out[+0x3b1]


> It appears to require frame-pointer
> stuff that AFAIK is not generally turned on for user-space.

It is not turned on for glibc on x86_64, but even there you can at least
get the return address to user-space
(which in case of a system call is not very useful since it is inside
libc, but could be useful for pagefault tracing for example).

#!/usr/bin/perl
my %cache;
while (<>) {
next unless / <- /;
my @list = split(/:| <- /);
foreach my $entry (@list) {
if ($entry =~ /(\/[^[]+)\[\+0x([a-f0-9]+)\]/) {
if (defined $cache{$entry}) {
print $cache{$entry};
} else {
my $file = $1;
my $addr = hex $2;
if (not ($file =~ /.so/)) {
$addr = 0x8048000 + $addr;
}
$addr = sprintf "0x%x", $addr;
$res = `addr2line -e $file $addr`;
$res =~ s/\n//;
if ($res =~ /\?\?/) {
$res = $entry;
}
$cache{$entry} = $res;
print $res;
}
print " <- "
} else {
print $entry;
}
}
}

Best regards,
--Edwin

2008-11-27 10:57:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

On Tue, 2008-11-25 at 09:40 -0500, Frank Ch. Eigler wrote:
> =?utf-8?q?T=C3=B6r=C3=B6k=20Edwin?= <[email protected]> writes:
>
> > Impact: modify+improve the userstacktrace tracing visualization feature
> > [...]
> > You'll see stack entries like:
> > /lib/libpthread-2.7.so[+0xd370]
> > [...]
>
> Can you suggest an actual distribution & architecture where this
> facility may be tested/used? It appears to require frame-pointer
> stuff that AFAIK is not generally turned on for user-space.

gentoo, just rebuild world with frame pointers ;-)

2008-11-27 12:49:42

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

Hi -

On Thu, Nov 27, 2008 at 11:41:45AM +0100, Peter Zijlstra wrote:
> > > Impact: modify+improve the userstacktrace tracing visualization feature
> > > [...]
> > > You'll see stack entries like:
> > > /lib/libpthread-2.7.so[+0xd370]
> > > [...]
> >
> > Can you suggest an actual distribution & architecture where this
> > facility may be tested/used? It appears to require frame-pointer
> > stuff that AFAIK is not generally turned on for user-space.
>
> gentoo, just rebuild world with frame pointers ;-)

Well, that only goes so far. If this feature turns out unable to work
without distributors recompiling all their stuff on, for example, x86-64,
then expectations need to be reset.


- FChE

2008-11-27 13:03:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

On Thu, 2008-11-27 at 07:48 -0500, Frank Ch. Eigler wrote:
> Hi -
>
> On Thu, Nov 27, 2008 at 11:41:45AM +0100, Peter Zijlstra wrote:
> > > > Impact: modify+improve the userstacktrace tracing visualization feature
> > > > [...]
> > > > You'll see stack entries like:
> > > > /lib/libpthread-2.7.so[+0xd370]
> > > > [...]
> > >
> > > Can you suggest an actual distribution & architecture where this
> > > facility may be tested/used? It appears to require frame-pointer
> > > stuff that AFAIK is not generally turned on for user-space.
> >
> > gentoo, just rebuild world with frame pointers ;-)
>
> Well, that only goes so far. If this feature turns out unable to work
> without distributors recompiling all their stuff on, for example, x86-64,
> then expectations need to be reset.

Who says you need a distributor to recompile stuff.. there are plenty
people running gentoo, and it doesn't take all that long to rebuild
world on these multi-socket quad core boxen we have today ;-)

Although with gcc getting ever slower and desktop software ever more
bloated who knows,.. but then its your own fault if you install gnome
and such, gentoo comes lean and mean from the stage3 cage.

I wish more distros would be as easy to rebuild as gentoo is, its a real
feature.

2008-11-27 13:04:01

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

On 2008-11-27 14:48, Frank Ch. Eigler wrote:
> Hi -
>
> On Thu, Nov 27, 2008 at 11:41:45AM +0100, Peter Zijlstra wrote:
>
>>>> Impact: modify+improve the userstacktrace tracing visualization feature
>>>> [...]
>>>> You'll see stack entries like:
>>>> /lib/libpthread-2.7.so[+0xd370]
>>>> [...]
>>>>
>>> Can you suggest an actual distribution & architecture where this
>>> facility may be tested/used? It appears to require frame-pointer
>>> stuff that AFAIK is not generally turned on for user-space.
>>>
>> gentoo, just rebuild world with frame pointers ;-)
>>
>
> Well, that only goes so far. If this feature turns out unable to work
> without distributors recompiling all their stuff on, for example, x86-64,
> then expectations need to be reset.

My assumption is that this feature will be used to trace individual
applications, and not the system as a whole.
Then you only need libc to be recompiled with frame pointers on, and
your own application/your own application's libraries.

That is what I want to use it for, and there isn't another solution that
allows me to do this.
Sure I can trace userspace alone using ptrace (which has its own
overhead), and the kernel alone by using ftrace, but I can't combine
those traces in a meaningful manner.
If/when the kernel will support dwarf unwinding, it will only need to
provide an alternate implementation for save_stack_trace_user.

Even without frame pointers you can at least get the return address to
userspace, which may be inside your application for page faults.

If I need to do system-wide tracing, I can use my 32-bit chroot [*], or
boot my laptop which is 32-bit.

I don't think that this feature should get rejected just because it is
not easily usable from x86_64.

[*] I haven't tested yet if tracing 32-bit applications from a 64-bit
kernel works. It probably won't, and I'll need to use a different struct
stack_frame with 32-bit addresses.

Another approach I've though of would be to deliver a signal to
userspace on demand, and have the signal handler do the backtrace, but
that would unnecesary overhead.

Best regards,
--Edwin

2008-11-27 14:11:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to


* T?r?k Edwin <[email protected]> wrote:

> On 2008-11-27 14:48, Frank Ch. Eigler wrote:
> > Hi -
> >
> > On Thu, Nov 27, 2008 at 11:41:45AM +0100, Peter Zijlstra wrote:
> >
> >>>> Impact: modify+improve the userstacktrace tracing visualization feature
> >>>> [...]
> >>>> You'll see stack entries like:
> >>>> /lib/libpthread-2.7.so[+0xd370]
> >>>> [...]
> >>>>
> >>> Can you suggest an actual distribution & architecture where this
> >>> facility may be tested/used? It appears to require frame-pointer
> >>> stuff that AFAIK is not generally turned on for user-space.
> >>>
> >> gentoo, just rebuild world with frame pointers ;-)
> >>
> >
> > Well, that only goes so far. If this feature turns out unable to work
> > without distributors recompiling all their stuff on, for example, x86-64,
> > then expectations need to be reset.
>
> My assumption is that this feature will be used to trace individual
> applications, and not the system as a whole. Then you only need libc
> to be recompiled with frame pointers on, and your own
> application/your own application's libraries.
>
> That is what I want to use it for, and there isn't another solution
> that allows me to do this. Sure I can trace userspace alone using
> ptrace (which has its own overhead), and the kernel alone by using
> ftrace, but I can't combine those traces in a meaningful manner.
> If/when the kernel will support dwarf unwinding, it will only need
> to provide an alternate implementation for save_stack_trace_user.

Yes.

> Even without frame pointers you can at least get the return address
> to userspace, which may be inside your application for page faults.
>
> If I need to do system-wide tracing, I can use my 32-bit chroot [*],
> or boot my laptop which is 32-bit.
>
> I don't think that this feature should get rejected just because it
> is not easily usable from x86_64.
>
> [*] I haven't tested yet if tracing 32-bit applications from a
> 64-bit kernel works. It probably won't, and I'll need to use a
> different struct stack_frame with 32-bit addresses.
>
> Another approach I've though of would be to deliver a signal to
> userspace on demand, and have the signal handler do the backtrace,
> but that would unnecesary overhead.

Correct, that would be stupid.

Your patches are nice. Right now they are in tracing/core and
linux-next already.

Ingo

2008-11-27 14:28:14

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

On 2008-11-27 16:10, Ingo Molnar wrote:
> Your patches are nice. Right now they are in tracing/core and
> linux-next already.

Thanks. I can move on to the lock latency tracing ;)

I'll send out a draft of tracepoints that I would need to trace lock
latency.
I'll try to put them in same place as lockstat (but not necesarely
depending on lockstat being enabled).

Once we reach a lock-tracepoints patch that all agree upon, I can (try
to) write a ftrace-lock-latency that will have a histogram view
(as you've suggested similar to what the likely/unlikely tracer already
does), but also show separate counts per unique kernel/user stacktrace.

Or I could add the tracepoints inside lockstat (now that it has contend
with points feature), and use the information already gathered by lockstat,
but augment it with finer grained counts per kernel/user stacktrace.
(again there would be an ftrace plugin that would register with the
tracepoints, and show
the per stacktrace statistic in /sys/kernel/debug/tracing/trace).

Which approach should I try first?

Although my goal would be to be able to use this feature by simply
turning on a flag at runtime (whether something in /proc or
/sys/kernel/debug),
rather than rebooting to a different kernel, it may be easier to
implement this at first by using what lockstat already provides, and
later improving it
to work w/o lockstat.

What do you think?

Best regards,
--Edwin

2008-11-27 14:51:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to


* T?r?k Edwin <[email protected]> wrote:

> Thanks. I can move on to the lock latency tracing ;)

that's a bit more contentious ...

> I'll send out a draft of tracepoints that I would need to trace lock
> latency. I'll try to put them in same place as lockstat (but not
> necesarely depending on lockstat being enabled).

> Or I could add the tracepoints inside lockstat (now that it has
> contend with points feature), and use the information already
> gathered by lockstat, but augment it with finer grained counts per
> kernel/user stacktrace. (again there would be an ftrace plugin that
> would register with the tracepoints, and show the per stacktrace
> statistic in /sys/kernel/debug/tracing/trace).

yes. The less intrusive your patch is, the more you utilize and
generalize existing facilities, the better. You could split the
Kconfig of LOCKSTAT into two bits: LOCKSTAT (core) and LOCKSTAT_PROC,
where the proc bits are enabled separately.

Your tracing approach could then reuse much of core LOCKSTAT (without
even touching the code) and just plain "select LOCKSTAT" - without
creating /proc/lockdep_stats.

Peter, what do you think?

Ingo

2008-11-28 10:06:00

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] vfs, seqfile: make mangle_path() global

On Sun, Nov 23, 2008 at 09:47:37AM +0100, Ingo Molnar wrote:
>
> * T?r?k Edwin <[email protected]> wrote:
>
> > fs/seq_file.c | 14 +++++++-
> > include/linux/seq_file.h | 1 +
>
> Note, i've split these bits out into a separate patch - see it
> attached below.

ACK, except that I wouldn't use _GPL in the export - it's trivial to
reproduce, so we are not protecting anything here. And all that code
has been moved verbatim from seq_path(), which is mine *and* exported
without _GPL nonsense.

As far as I'm concerned, all these helpers are as general-purpose as atoi()
et.al. - library functions damn close to being non-copyrightable due both
to triviality and to being absolutely straightforward implementations -
tell anybody to implement it and that's what you'll get.

I'm not fond of proprietary modules, to put it mildly, but let's not get
completely ridiculous. In this case it's as dumb as schools trying to ban
aspirin in the name of War On Some Drugs.

2008-11-28 17:08:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] vfs, seqfile: make mangle_path() global


* Al Viro <[email protected]> wrote:

> On Sun, Nov 23, 2008 at 09:47:37AM +0100, Ingo Molnar wrote:
> >
> > * T?r?k Edwin <[email protected]> wrote:
> >
> > > fs/seq_file.c | 14 +++++++-
> > > include/linux/seq_file.h | 1 +
> >
> > Note, i've split these bits out into a separate patch - see it
> > attached below.
>
> ACK, [...]

thanks!

> [...] except that I wouldn't use _GPL in the export - it's trivial to
> reproduce, so we are not protecting anything here. And all that code
> has been moved verbatim from seq_path(), which is mine *and* exported
> without _GPL nonsense.
>
> As far as I'm concerned, all these helpers are as general-purpose as
> atoi() et.al. - library functions damn close to being non-copyrightable
> due both to triviality and to being absolutely straightforward
> implementations - tell anybody to implement it and that's what you'll
> get.
>
> I'm not fond of proprietary modules, to put it mildly, but let's not
> get completely ridiculous. In this case it's as dumb as schools trying
> to ban aspirin in the name of War On Some Drugs.

okay, fair enough. Would the commit below be fine with you?

Also, since we depend on this commit, would it be fine with you if we
carried the trivial patch in tip/tracing/*?

Ingo

---------------->
>From 604094f4615180f71da799e7e5b191f5c2a42a28 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Fri, 28 Nov 2008 18:03:22 +0100
Subject: [PATCH] vfs, seqfile: export mangle_path() generally

mangle_path() is trivial enough to make export restrictions on it
pointless - so change the export from EXPORT_SYMBOL_GPL to EXPORT_SYMBOL.

Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Al Viro <[email protected]>
---
fs/seq_file.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index f03220d..16c2115 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -387,7 +387,7 @@ char *mangle_path(char *s, char *p, char *esc)
}
return NULL;
}
-EXPORT_SYMBOL_GPL(mangle_path);
+EXPORT_SYMBOL(mangle_path);

/*
* return the absolute path of 'dentry' residing in mount 'mnt'.

2008-12-09 20:40:52

by Török Edwin

[permalink] [raw]
Subject: Re: [PATCH 2/2] tracing: identify which executable object the userspace address belongs to

On 2008-11-27 16:51, Ingo Molnar wrote:
> * T?r?k Edwin <[email protected]> wrote:
>
>
>> Thanks. I can move on to the lock latency tracing ;)
>>
>
> that's a bit more contentious ...
>
>
>> I'll send out a draft of tracepoints that I would need to trace lock
>> latency. I'll try to put them in same place as lockstat (but not
>> necesarely depending on lockstat being enabled).
>>
>
>
>> Or I could add the tracepoints inside lockstat (now that it has
>> contend with points feature), and use the information already
>> gathered by lockstat, but augment it with finer grained counts per
>> kernel/user stacktrace. (again there would be an ftrace plugin that
>> would register with the tracepoints, and show the per stacktrace
>> statistic in /sys/kernel/debug/tracing/trace).
>>
>
> yes. The less intrusive your patch is, the more you utilize and
> generalize existing facilities, the better. You could split the
> Kconfig of LOCKSTAT into two bits: LOCKSTAT (core) and LOCKSTAT_PROC,
> where the proc bits are enabled separately.
>
> Your tracing approach could then reuse much of core LOCKSTAT (without
> even touching the code) and just plain "select LOCKSTAT" - without
> creating /proc/lockdep_stats.
>
> Peter, what do you think?

Ping? I may have some time to work on this in the weekend ...

Best regards,
--Edwin