2008-10-16 21:05:43

by Jason Baron

[permalink] [raw]
Subject: tracepoints for kernel/mutex.c

hi,

Below are 3 tracepoints I've been playing with in kernel/mutex.c using
a SystemTap script. The idea is to detect and determine the cause of
lock contention. Currently I get the following output:

<contended mutex nam> <process name and pid of the contention> <time of
contention> <pid that woke me up(caused the contention?)>

for example:

name: &inode->i_mutex, pid name: udevd, pid: 26567, jiffies: 1 cause:
26566


Using SystemTap I can now modify my script to figure out the inode number, the
file and pathname, or the backtrace and get to the bottom of the
contention.

I think this approach has a number of advantages. It has low
overhead in the off case, since its based on tracepoints. It is
minimally invasive in the code path (3 tracepoints). It also allows me
to explore data structures and parts of the kernel by simply modifying
the SystemTap script. I do not need to re-compile the kernel and reboot.

You can run the script via: $stap mutex.stp

thanks,

-Jason


diff --git a/include/trace/mutex.h b/include/trace/mutex.h
new file mode 100644
index 0000000..8d365d6
--- /dev/null
+++ b/include/trace/mutex.h
@@ -0,0 +1,10 @@
+#ifndef _TRACE_MUTEX_H
+#define _TRACE_MUTEX_H
+
+#include <linux/tracepoint.h>
+
+DEFINE_TRACE(mutex_lock_contention, TPPROTO(int pid, unsigned long time), TPARGS(pid, time));
+DEFINE_TRACE(mutex_lock_contention_length, TPPROTO(char *name, int pid, unsigned long time), TPARGS(name, pid, time));
+DEFINE_TRACE(mutex_wakeup, TPPROTO(int target_pid, int sender_pid), TPARGS(target_pid, sender_pid));
+
+#endif
diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c
index 1d94160..847a9fb 100644
--- a/kernel/mutex-debug.c
+++ b/kernel/mutex-debug.c
@@ -97,6 +97,7 @@ void debug_mutex_init(struct mutex *lock, const char *name,
#endif
lock->owner = NULL;
lock->magic = lock;
+ lock->name = name;
}

/***
diff --git a/kernel/mutex.c b/kernel/mutex.c
index 12c779d..62c2aab 100644
--- a/kernel/mutex.c
+++ b/kernel/mutex.c
@@ -18,6 +18,7 @@
#include <linux/spinlock.h>
#include <linux/interrupt.h>
#include <linux/debug_locks.h>
+#include <trace/mutex.h>

/*
* In the DEBUG case we are using the "NULL fastpath" for mutexes,
@@ -147,6 +148,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
goto done;

lock_contended(&lock->dep_map, ip);
+ trace_mutex_lock_contention((int)task->pid, jiffies);

for (;;) {
/*
@@ -182,7 +184,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
schedule();
spin_lock_mutex(&lock->wait_lock, flags);
}
-
+ trace_mutex_lock_contention_length(lock->name, (int)task->pid, jiffies);
done:
lock_acquired(&lock->dep_map);
/* got the lock - rejoice! */
@@ -256,7 +258,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
struct mutex_waiter, list);

debug_mutex_wake_waiter(lock, waiter);
-
+ trace_mutex_wakeup(waiter->task->pid, current->pid);
wake_up_process(waiter->task);
}

diff --git a/ltt/probes/Makefile b/ltt/probes/Makefile
index 2ee8a46..0c54d01 100644
--- a/ltt/probes/Makefile
+++ b/ltt/probes/Makefile
@@ -10,7 +10,8 @@ CFLAGS_REMOVE_rcu-trace.o = -pg
endif

obj-$(CONFIG_LTT_TRACEPROBES) += kernel-trace.o mm-trace.o fs-trace.o \
- ipc-trace.o lockdep-trace.o rcu-trace.o
+ ipc-trace.o lockdep-trace.o rcu-trace.o \
+ mutex-trace.o

ifeq ($(CONFIG_NET),y)
ifdef CONFIG_FTRACE
diff --git a/ltt/probes/mutex-trace.c b/ltt/probes/mutex-trace.c
new file mode 100644
index 0000000..3b1ebeb
--- /dev/null
+++ b/ltt/probes/mutex-trace.c
@@ -0,0 +1,43 @@
+/*
+ * ltt/probes/fs-trace.c
+ *
+ * Mutex tracepoint probes.
+ */
+
+#include <linux/module.h>
+#include <trace/mutex.h>
+
+static unsigned long mutex_times[PID_MAX_DEFAULT+1];
+static int mutex_wakeup[PID_MAX_DEFAULT+1];
+
+void probe_mutex_lock_contention(int pid, unsigned long time)
+{
+ mutex_times[pid] = (unsigned long)time;
+
+ trace_mark_tp(mutex_lock_contention, mutex_lock_contention,
+ probe_mutex_lock_contention, "pid %d time %lu", pid, time);
+}
+
+void probe_mutex_lock_contention_length(char *name, int pid, unsigned long time)
+{
+ unsigned long delay = time - mutex_times[pid];
+ int reason = mutex_wakeup[pid];
+
+ /* FIXME: why is name sometimes null? */
+ if (!name)
+ return;
+
+ trace_mark_tp(mutex_lock_contention_length, mutex_lock_contention_length,
+ probe_mutex_lock_contention_length,
+ "name %s pid %d delay %lu cause %d",
+ name, pid, delay, reason);
+}
+
+void probe_mutex_wakeup(int target_pid, int sender_pid)
+{
+ mutex_wakeup[target_pid] = sender_pid;
+
+ trace_mark_tp(mutex_wakeup, mutex_wakeup,
+ probe_mutex_wakeup, "pid %d pid %d", target_pid, sender_pid);
+}
+


# mutex.stp

probe kernel.mark("mutex_lock_contention")
{
}

probe kernel.mark("mutex_wakeup")
{
}

probe kernel.mark("mutex_lock_contention_length")
{
name = $arg1
pid = $arg2
delay = $arg3
cause = $arg4

printf("name: %s, pid name: %s, pid: %d, jiffies: %d cause: %d\n",
name, execname(), pid, delay, cause);
}


2008-10-16 21:35:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:

> Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> a SystemTap script. The idea is to detect and determine the cause of
> lock contention. Currently I get the following output:
>
> <contended mutex nam> <process name and pid of the contention> <time of
> contention> <pid that woke me up(caused the contention?)>

> I think this approach has a number of advantages. It has low
> overhead in the off case, since its based on tracepoints. It is
> minimally invasive in the code path (3 tracepoints). It also allows me
> to explore data structures and parts of the kernel by simply modifying
> the SystemTap script. I do not need to re-compile the kernel and reboot.

*sigh* this is why I hate markers and all related things...

_IFF_ you want to place tracepoints, get them in the same place as the
lock-dep/stat hooks, that way you get all the locks, not only mutexes.

This is the same reason I absolutely _hate_ Edwin's rwsem tracer.

Folks, lets please start by getting the tracing infrastructure in and
those few high-level trace-points google proposed.

Until we get the basics in, I think I'm going to NAK any and all
tracepoint/marker patches.

2008-10-16 22:11:00

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

* Jason Baron ([email protected]) wrote:
> hi,
>
> Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> a SystemTap script. The idea is to detect and determine the cause of
> lock contention. Currently I get the following output:
>
> <contended mutex nam> <process name and pid of the contention> <time of
> contention> <pid that woke me up(caused the contention?)>
>
> for example:
>
> name: &inode->i_mutex, pid name: udevd, pid: 26567, jiffies: 1 cause:
> 26566
>
>
> Using SystemTap I can now modify my script to figure out the inode number, the
> file and pathname, or the backtrace and get to the bottom of the
> contention.
>
> I think this approach has a number of advantages. It has low
> overhead in the off case, since its based on tracepoints. It is
> minimally invasive in the code path (3 tracepoints). It also allows me
> to explore data structures and parts of the kernel by simply modifying
> the SystemTap script. I do not need to re-compile the kernel and reboot.
>
> You can run the script via: $stap mutex.stp
>
> thanks,
>
> -Jason
>
>
> diff --git a/include/trace/mutex.h b/include/trace/mutex.h
> new file mode 100644
> index 0000000..8d365d6
> --- /dev/null
> +++ b/include/trace/mutex.h
> @@ -0,0 +1,10 @@
> +#ifndef _TRACE_MUTEX_H
> +#define _TRACE_MUTEX_H
> +
> +#include <linux/tracepoint.h>
> +
> +DEFINE_TRACE(mutex_lock_contention, TPPROTO(int pid, unsigned long time), TPARGS(pid, time));
> +DEFINE_TRACE(mutex_lock_contention_length, TPPROTO(char *name, int pid, unsigned long time), TPARGS(name, pid, time));
> +DEFINE_TRACE(mutex_wakeup, TPPROTO(int target_pid, int sender_pid), TPARGS(target_pid, sender_pid));
> +
> +#endif
> diff --git a/kernel/mutex-debug.c b/kernel/mutex-debug.c
> index 1d94160..847a9fb 100644
> --- a/kernel/mutex-debug.c
> +++ b/kernel/mutex-debug.c
> @@ -97,6 +97,7 @@ void debug_mutex_init(struct mutex *lock, const char *name,
> #endif
> lock->owner = NULL;
> lock->magic = lock;
> + lock->name = name;
> }
>
> /***
> diff --git a/kernel/mutex.c b/kernel/mutex.c
> index 12c779d..62c2aab 100644
> --- a/kernel/mutex.c
> +++ b/kernel/mutex.c
> @@ -18,6 +18,7 @@
> #include <linux/spinlock.h>
> #include <linux/interrupt.h>
> #include <linux/debug_locks.h>
> +#include <trace/mutex.h>
>
> /*
> * In the DEBUG case we are using the "NULL fastpath" for mutexes,
> @@ -147,6 +148,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> goto done;
>
> lock_contended(&lock->dep_map, ip);
> + trace_mutex_lock_contention((int)task->pid, jiffies);
>
> for (;;) {
> /*
> @@ -182,7 +184,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> schedule();
> spin_lock_mutex(&lock->wait_lock, flags);
> }
> -
> + trace_mutex_lock_contention_length(lock->name, (int)task->pid, jiffies);
> done:
> lock_acquired(&lock->dep_map);
> /* got the lock - rejoice! */
> @@ -256,7 +258,7 @@ __mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
> struct mutex_waiter, list);
>
> debug_mutex_wake_waiter(lock, waiter);
> -
> + trace_mutex_wakeup(waiter->task->pid, current->pid);
> wake_up_process(waiter->task);
> }
>
> diff --git a/ltt/probes/Makefile b/ltt/probes/Makefile
> index 2ee8a46..0c54d01 100644
> --- a/ltt/probes/Makefile
> +++ b/ltt/probes/Makefile
> @@ -10,7 +10,8 @@ CFLAGS_REMOVE_rcu-trace.o = -pg
> endif
>
> obj-$(CONFIG_LTT_TRACEPROBES) += kernel-trace.o mm-trace.o fs-trace.o \
> - ipc-trace.o lockdep-trace.o rcu-trace.o
> + ipc-trace.o lockdep-trace.o rcu-trace.o \
> + mutex-trace.o
>
> ifeq ($(CONFIG_NET),y)
> ifdef CONFIG_FTRACE
> diff --git a/ltt/probes/mutex-trace.c b/ltt/probes/mutex-trace.c
> new file mode 100644
> index 0000000..3b1ebeb
> --- /dev/null
> +++ b/ltt/probes/mutex-trace.c
> @@ -0,0 +1,43 @@
> +/*
> + * ltt/probes/fs-trace.c
> + *

Perhaps you meant mutex-trace.c ? :)

> + * Mutex tracepoint probes.
> + */
> +
> +#include <linux/module.h>
> +#include <trace/mutex.h>
> +
> +static unsigned long mutex_times[PID_MAX_DEFAULT+1];
> +static int mutex_wakeup[PID_MAX_DEFAULT+1];
> +

What happens if PID_MAX_LIMIT > PID_MAX_DEFAULT ?

What happens when a process ID gets reused (exit/fork reusing a PID) ?

Could you add this information to the task_structs or does it really
have to be separate ?

Mathieu

> +void probe_mutex_lock_contention(int pid, unsigned long time)
> +{
> + mutex_times[pid] = (unsigned long)time;
> +
> + trace_mark_tp(mutex_lock_contention, mutex_lock_contention,
> + probe_mutex_lock_contention, "pid %d time %lu", pid, time);
> +}
> +
> +void probe_mutex_lock_contention_length(char *name, int pid, unsigned long time)
> +{
> + unsigned long delay = time - mutex_times[pid];
> + int reason = mutex_wakeup[pid];
> +
> + /* FIXME: why is name sometimes null? */
> + if (!name)
> + return;
> +
> + trace_mark_tp(mutex_lock_contention_length, mutex_lock_contention_length,
> + probe_mutex_lock_contention_length,
> + "name %s pid %d delay %lu cause %d",
> + name, pid, delay, reason);
> +}
> +
> +void probe_mutex_wakeup(int target_pid, int sender_pid)
> +{
> + mutex_wakeup[target_pid] = sender_pid;
> +
> + trace_mark_tp(mutex_wakeup, mutex_wakeup,
> + probe_mutex_wakeup, "pid %d pid %d", target_pid, sender_pid);
> +}
> +
>
>
> # mutex.stp
>
> probe kernel.mark("mutex_lock_contention")
> {
> }
>
> probe kernel.mark("mutex_wakeup")
> {
> }
>
> probe kernel.mark("mutex_lock_contention_length")
> {
> name = $arg1
> pid = $arg2
> delay = $arg3
> cause = $arg4
>
> printf("name: %s, pid name: %s, pid: %d, jiffies: %d cause: %d\n",
> name, execname(), pid, delay, cause);
> }
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2008-10-16 22:15:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

On Thu, 16 Oct 2008, Peter Zijlstra wrote:
> On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> Folks, lets please start by getting the tracing infrastructure in and
> those few high-level trace-points google proposed.
>
> Until we get the basics in, I think I'm going to NAK any and all
> tracepoint/marker patches.

Ditto.

tglx

2008-10-17 05:10:08

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

Hi

> Folks, lets please start by getting the tracing infrastructure in and
> those few high-level trace-points google proposed.
>
> Until we get the basics in, I think I'm going to NAK any and all
> tracepoint/marker patches.

Could you please tell me how to get these patches?
I'd like to review it.


2008-10-17 08:15:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

On Fri, 2008-10-17 at 14:09 +0900, KOSAKI Motohiro wrote:
> Hi
>
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
>
> Could you please tell me how to get these patches?
> I'd like to review it.

I think writing is more the issue than reviewing here ;-)

2008-10-17 08:23:26

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

> On Fri, 2008-10-17 at 14:09 +0900, KOSAKI Motohiro wrote:
> > Hi
> >
> > > Folks, lets please start by getting the tracing infrastructure in and
> > > those few high-level trace-points google proposed.
> > >
> > > Until we get the basics in, I think I'm going to NAK any and all
> > > tracepoint/marker patches.
> >
> > Could you please tell me how to get these patches?
> > I'd like to review it.
>
> I think writing is more the issue than reviewing here ;-)

yup. I understand Jason's patch issue.
therefore, I interest to alternative googler approach as you said :)


2008-10-17 14:49:34

by Jason Baron

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
>
> > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > a SystemTap script. The idea is to detect and determine the cause of
> > lock contention. Currently I get the following output:
> >
> > <contended mutex nam> <process name and pid of the contention> <time of
> > contention> <pid that woke me up(caused the contention?)>
>
> > I think this approach has a number of advantages. It has low
> > overhead in the off case, since its based on tracepoints. It is
> > minimally invasive in the code path (3 tracepoints). It also allows me
> > to explore data structures and parts of the kernel by simply modifying
> > the SystemTap script. I do not need to re-compile the kernel and reboot.
>
> *sigh* this is why I hate markers and all related things...
>
> _IFF_ you want to place tracepoints, get them in the same place as the
> lock-dep/stat hooks, that way you get all the locks, not only mutexes.

makes sense. So we could layer lock-dep/stat on top of tracepoints? That
would potentially also make lock-dep/stat more dynamic.

>
> This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
>

i'm trying to get some consensus on these types of patches. Do we
want to create a new tracer for each thing we want to trace, or add
tracepoints?

> Folks, lets please start by getting the tracing infrastructure in and
> those few high-level trace-points google proposed.
>
> Until we get the basics in, I think I'm going to NAK any and all
> tracepoint/marker patches.
>

I think that core locking functions are pretty basic...

2008-10-17 16:36:41

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

* KOSAKI Motohiro ([email protected]) wrote:
> Hi
>
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
>
> Could you please tell me how to get these patches?
> I'd like to review it.
>

Hi Kosaki,

I am focusing on the timestamping/data relay/tracing control
infrastructure for now, so I think it is too soon to post the
instrumentation patches selelected from LTTng which correspond to the
Google instrumentation before we get something that makes that data
useful within the kernel, but if you still want to have a look, comments
are welcome.

See
git://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-lttng.git
or
http://www.kernel.org/pub/linux/kernel/people/compudj/patches/2.6/2.6.27/patch-2.6.27-lttng-0.43.tar.bz2

Commits :

irq-handle-prepare-for-instrumentation.patch
lttng-instrumentation-irq.patch
lttng-instrumentation-scheduler.patch
lttng-instrumentation-timer.patch
lttng-instrumentation-kernel.patch
lttng-instrumentation-filemap.patch
lttng-instrumentation-swap.patch
lttng-instrumentation-socket.patch

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2008-10-17 16:44:05

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

* Jason Baron ([email protected]) wrote:
> On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> >
> > > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > > a SystemTap script. The idea is to detect and determine the cause of
> > > lock contention. Currently I get the following output:
> > >
> > > <contended mutex nam> <process name and pid of the contention> <time of
> > > contention> <pid that woke me up(caused the contention?)>
> >
> > > I think this approach has a number of advantages. It has low
> > > overhead in the off case, since its based on tracepoints. It is
> > > minimally invasive in the code path (3 tracepoints). It also allows me
> > > to explore data structures and parts of the kernel by simply modifying
> > > the SystemTap script. I do not need to re-compile the kernel and reboot.
> >
> > *sigh* this is why I hate markers and all related things...
> >
> > _IFF_ you want to place tracepoints, get them in the same place as the
> > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
>
> makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> would potentially also make lock-dep/stat more dynamic.
>
> >
> > This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
> >
>
> i'm trying to get some consensus on these types of patches. Do we
> want to create a new tracer for each thing we want to trace, or add
> tracepoints?
>
> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
> >
>
> I think that core locking functions are pretty basic...
>
>

Guys, please, let's focus on the infrastructure to manage trace data
(timestamping, buffering, event ID, event type management) before going
any further in the instrumentation direction. Otherwise we will end up
adding instrumentation in the Linux kernel without any in-kernel user
(or various tiny users without overall management insfrastructure), and
this sounds like a no-go. This is what Peter and Thomas are saying by
talking of NACK. Their position is actually nothing new : Linus has been
the first to say that at the Kernel Summit 2008, and I think it makes
sense.

We will have plenty of time to optimize
markers/tracepoints/lockdep/function trace _once_ we get the data
collection right.

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2008-10-17 16:49:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

On Fri, 2008-10-17 at 10:48 -0400, Jason Baron wrote:
> On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> > On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> >
> > > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > > a SystemTap script. The idea is to detect and determine the cause of
> > > lock contention. Currently I get the following output:
> > >
> > > <contended mutex nam> <process name and pid of the contention> <time of
> > > contention> <pid that woke me up(caused the contention?)>
> >
> > > I think this approach has a number of advantages. It has low
> > > overhead in the off case, since its based on tracepoints. It is
> > > minimally invasive in the code path (3 tracepoints). It also allows me
> > > to explore data structures and parts of the kernel by simply modifying
> > > the SystemTap script. I do not need to re-compile the kernel and reboot.
> >
> > *sigh* this is why I hate markers and all related things...
> >
> > _IFF_ you want to place tracepoints, get them in the same place as the
> > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
>
> makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> would potentially also make lock-dep/stat more dynamic.

I'm afraid that won't work. Both lockdep and lockstat rely on added data
to the lock structure. But what you can do is expose the hooks as
tracepoints when lockdep/lockstat is configured.

> >
> > This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
> >
>
> i'm trying to get some consensus on these types of patches. Do we
> want to create a new tracer for each thing we want to trace, or add
> tracepoints?

The only thing I'd consider is one lock-tracer that exposes all
lockdep/lockstat hooks. Any half-assed partial solution won't fly.

> > Folks, lets please start by getting the tracing infrastructure in and
> > those few high-level trace-points google proposed.
> >
> > Until we get the basics in, I think I'm going to NAK any and all
> > tracepoint/marker patches.
> >
>
> I think that core locking functions are pretty basic...

For kernel developers, yes. For userspace stuff like latencytop should
be good enough to notice something is up.

And kernel developers can recompile their kernel - that's the only way
you're going to do anything about lock contention anyway.

2008-10-17 16:54:19

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

* Peter Zijlstra ([email protected]) wrote:
> On Fri, 2008-10-17 at 10:48 -0400, Jason Baron wrote:
> > On Thu, Oct 16, 2008 at 11:34:38PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2008-10-16 at 17:04 -0400, Jason Baron wrote:
> > >
> > > > Below are 3 tracepoints I've been playing with in kernel/mutex.c using
> > > > a SystemTap script. The idea is to detect and determine the cause of
> > > > lock contention. Currently I get the following output:
> > > >
> > > > <contended mutex nam> <process name and pid of the contention> <time of
> > > > contention> <pid that woke me up(caused the contention?)>
> > >
> > > > I think this approach has a number of advantages. It has low
> > > > overhead in the off case, since its based on tracepoints. It is
> > > > minimally invasive in the code path (3 tracepoints). It also allows me
> > > > to explore data structures and parts of the kernel by simply modifying
> > > > the SystemTap script. I do not need to re-compile the kernel and reboot.
> > >
> > > *sigh* this is why I hate markers and all related things...
> > >
> > > _IFF_ you want to place tracepoints, get them in the same place as the
> > > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
> >
> > makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> > would potentially also make lock-dep/stat more dynamic.
>
> I'm afraid that won't work. Both lockdep and lockstat rely on added data
> to the lock structure. But what you can do is expose the hooks as
> tracepoints when lockdep/lockstat is configured.
>
> > >
> > > This is the same reason I absolutely _hate_ Edwin's rwsem tracer.
> > >
> >
> > i'm trying to get some consensus on these types of patches. Do we
> > want to create a new tracer for each thing we want to trace, or add
> > tracepoints?
>
> The only thing I'd consider is one lock-tracer that exposes all
> lockdep/lockstat hooks. Any half-assed partial solution won't fly.
>
> > > Folks, lets please start by getting the tracing infrastructure in and
> > > those few high-level trace-points google proposed.
> > >
> > > Until we get the basics in, I think I'm going to NAK any and all
> > > tracepoint/marker patches.
> > >
> >
> > I think that core locking functions are pretty basic...
>
> For kernel developers, yes. For userspace stuff like latencytop should
> be good enough to notice something is up.
>
> And kernel developers can recompile their kernel - that's the only way
> you're going to do anything about lock contention anyway.
>

We also have to consider if a less specific instrumentation can extract
this kind of data. Scheduler instrumentation can do a big part of that
job, namely identifying the wakeup cause. We would not have the
information about it being mutex-related unless we activate lockdep
tracing, potentially with tracepoints in the lockdep handlers. I think
this would fly. But let's keep this for later.

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2008-10-17 16:59:24

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: tracepoints for kernel/mutex.c

Hi -

On Fri, Oct 17, 2008 at 12:43:52PM -0400, Mathieu Desnoyers wrote:
> [...]
> > > _IFF_ you want to place tracepoints, get them in the same place as the
> > > lock-dep/stat hooks, that way you get all the locks, not only mutexes.
> >
> > makes sense. So we could layer lock-dep/stat on top of tracepoints? That
> > would potentially also make lock-dep/stat more dynamic.

> Guys, please, let's focus on the infrastructure to manage trace data
> (timestamping, buffering, event ID, event type management) before
> going any further in the instrumentation direction.

Any trace data management widget design that precludes connection to
an event source as simple as tracepoints or markers is going to be a
disappointment.


> Otherwise we will end up adding instrumentation in the Linux kernel
> without any in-kernel user [...]

Connecting markers to /proc style text files has been demonstrated in
less than a hundred lines of code.


Plus, Jason's note clearly referred to another in-kernel use of this
instrumentation: the possibility of connecting lockdep via generic
tracepoints in the lock-related code rather than special-purpose
hooks. One benefit could be being able to compile in lockdep and/or
lockstat by default (activating it via a boot option). The other
would be of course the concurrent/alternative of the instrumentation
for performance-related purposes.

- FChE