2011-02-22 15:39:12

by Hitoshi Mitake

[permalink] [raw]
Subject: [PATCH] perf lock: clean the options for perf record

From: Hitoshi Mitake <[email protected]>

Hi Frederic, really sorry for my slow work...

This patch cleans the options passed for perf record(or cmd_record()).
1. remove ":r" at the tail of the name of events, because it is not supported now
2. remove "-f" deprecated option, because it is already default

Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
tools/perf/builtin-lock.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index e00d938..2359f52 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -926,13 +926,12 @@ static const struct option lock_options[] = {
static const char *record_args[] = {
"record",
"-R",
- "-f",
"-m", "1024",
"-c", "1",
- "-e", "lock:lock_acquire:r",
- "-e", "lock:lock_acquired:r",
- "-e", "lock:lock_contended:r",
- "-e", "lock:lock_release:r",
+ "-e", "lock:lock_acquire",
+ "-e", "lock:lock_acquired",
+ "-e", "lock:lock_contended",
+ "-e", "lock:lock_release",
};

static int __cmd_record(int argc, const char **argv)
--
1.7.3.3


2011-02-22 15:39:13

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年02月23日 00:28, Hitoshi Mitake wrote:
> From: Hitoshi Mitake<[email protected]>
>
> Hi Frederic, really sorry for my slow work...
>
> This patch cleans the options passed for perf record(or cmd_record()).
> 1. remove ":r" at the tail of the name of events, because it is not supported now
> 2. remove "-f" deprecated option, because it is already default
>
> Signed-off-by: Hitoshi Mitake<[email protected]>
> Cc: Peter Zijlstra<[email protected]>
> Cc: Paul Mackerras<[email protected]>
> Cc: Ingo Molnar<[email protected]>
> Cc: Arnaldo Carvalho de Melo<[email protected]>
> Cc: Steven Rostedt<[email protected]>
> ---
> tools/perf/builtin-lock.c | 9 ++++-----
> 1 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index e00d938..2359f52 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
> @@ -926,13 +926,12 @@ static const struct option lock_options[] = {
> static const char *record_args[] = {
> "record",
> "-R",
> - "-f",
> "-m", "1024",
> "-c", "1",
> - "-e", "lock:lock_acquire:r",
> - "-e", "lock:lock_acquired:r",
> - "-e", "lock:lock_contended:r",
> - "-e", "lock:lock_release:r",
> + "-e", "lock:lock_acquire",
> + "-e", "lock:lock_acquired",
> + "-e", "lock:lock_contended",
> + "-e", "lock:lock_release",
> };
>
> static int __cmd_record(int argc, const char **argv)


BTW, I have a proposal of changing the way of tracing lock event.

Currently, lockdep functions generate lock events (e.g. lock_acquire
and trace_lock_acquire). I don't think that lockdep is an optimal
place for lock event tracing, because:

1. Some subsystems (e.g. workqueue) uses lockdep for their own
validation. It is confusing for modeling locking sequence. I don't
think that the events from lockdep is needless, this might be useful
for trouble shooting or other purposes. But it is not suitable for
performance analyzing oriented tracing.

2. Currently, every type of lock (spinlock, rwlock, mutex, rwsem)
employs common event tracing. This is not suitable from the perspective
of the overhead of tracing. Even if a user only looks at the one type
of locks, tracing produces the overhead of the entire. This is harmful
because lock event is high frequency. <type>_acquire (e.g.
spin_acquire) might be suitable place for putting tracing.

So making new classes of lock event per types is suitable. Tracing
the one of them or combination of them will be better from the
perspectives of both of modeling and performance.

How do you think about it?

# Shamefully, I still cannot show the concrete example of performance
# improvement by perf lock. It is another big problem... :(

Thanks,
Hitoshi

2011-02-22 15:43:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
> How do you think about it?

Most of the lock code (esp the spinlock stuff) is already way over the
threshold of sanity, adding to that for some dubious reasons doesn't
seem like a good idea.

I'm still not at all sure why people want all this lock tracing.

2011-02-22 18:09:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Wed, Feb 23, 2011 at 12:28:27AM +0900, Hitoshi Mitake wrote:
> From: Hitoshi Mitake <[email protected]>
>
> Hi Frederic, really sorry for my slow work...
>
> This patch cleans the options passed for perf record(or cmd_record()).
> 1. remove ":r" at the tail of the name of events, because it is not supported now
> 2. remove "-f" deprecated option, because it is already default
>
> Signed-off-by: Hitoshi Mitake <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> ---

Thanks.

Acked-by: Frederic Weisbecker <[email protected]>

2011-02-22 18:23:18

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
> On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
> > How do you think about it?
>
> Most of the lock code (esp the spinlock stuff) is already way over the
> threshold of sanity, adding to that for some dubious reasons doesn't
> seem like a good idea.
>
> I'm still not at all sure why people want all this lock tracing.

Right, well I can imagine many usecases that could make lock
tracing bring more value than what lockstat already provides,
through a tool like perf lock if we enhance it.

We should probably first focus on developing the tooling side
and make it useful enough that optimizations in the kernel
side become desirable.

2011-02-23 04:18:03

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年02月23日 03:22, Frederic Weisbecker wrote:
> On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
>> On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
>>> How do you think about it?
>>
>> Most of the lock code (esp the spinlock stuff) is already way over the
>> threshold of sanity, adding to that for some dubious reasons doesn't
>> seem like a good idea.
>>
>> I'm still not at all sure why people want all this lock tracing.
>
> Right, well I can imagine many usecases that could make lock
> tracing bring more value than what lockstat already provides,
> through a tool like perf lock if we enhance it.
>
> We should probably first focus on developing the tooling side
> and make it useful enough that optimizations in the kernel
> side become desirable.
>

Yes, lockstat only provides the lock usage statistics of
entire of the system. perf lock will be able to provide the partial
information of specified term, or the degree of dependency
between locks.

2011-02-24 15:47:08

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年02月23日 13:17, Hitoshi Mitake wrote:
> On 2011年02月23日 03:22, Frederic Weisbecker wrote:
>> On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
>>> On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
>>>> How do you think about it?
>>>
>>> Most of the lock code (esp the spinlock stuff) is already way over the
>>> threshold of sanity, adding to that for some dubious reasons doesn't
>>> seem like a good idea.
>>>
>>> I'm still not at all sure why people want all this lock tracing.
>>
>> Right, well I can imagine many usecases that could make lock
>> tracing bring more value than what lockstat already provides,
>> through a tool like perf lock if we enhance it.
>>
>> We should probably first focus on developing the tooling side
>> and make it useful enough that optimizations in the kernel
>> side become desirable.
>>
>
> Yes, lockstat only provides the lock usage statistics of
> entire of the system. perf lock will be able to provide the partial
> information of specified term, or the degree of dependency
> between locks.
>

For trial, I created new tracepoint for rwsem and tested.
Names of events are rwsem_{acquire, contended, acquired, release},
their meanings are similar to lock_{...}.

I traced perf bench sched messaging and result was,

mitake@x201i:~/linux/.../tools/perf% ./perf bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 1.252 [sec]
mitake@x201i:~/linux/.../tools/perf% sudo ./perf record -R -m 1024 -c 1
-e rwsem:rwsem_acquire -e
rwsem:rwsem_release,rwsem:rwsem_contended,rwsem:rwsem_acquired ./perf
bench sched messaging
# Running sched/messaging benchmark...
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

Total time: 1.332 [sec]
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 13.495 MB perf.data (~589597 samples) ]

raw execution of sched messaging was 1.252 sec, and traced version
was 1.332 sec. This overhead is far smaller than the overhead of
current lock tracepoints.

I think that it is possible to write some meaningful tools
like reader/writer ratio measuring. If something can be written,
I'll post it.

2011-02-24 16:52:16

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Fri, Feb 25, 2011 at 12:46:40AM +0900, Hitoshi Mitake wrote:
> On 2011年02月23日 13:17, Hitoshi Mitake wrote:
> >On 2011年02月23日 03:22, Frederic Weisbecker wrote:
> >>On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
> >>>On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
> >>>>How do you think about it?
> >>>
> >>>Most of the lock code (esp the spinlock stuff) is already way over the
> >>>threshold of sanity, adding to that for some dubious reasons doesn't
> >>>seem like a good idea.
> >>>
> >>>I'm still not at all sure why people want all this lock tracing.
> >>
> >>Right, well I can imagine many usecases that could make lock
> >>tracing bring more value than what lockstat already provides,
> >>through a tool like perf lock if we enhance it.
> >>
> >>We should probably first focus on developing the tooling side
> >>and make it useful enough that optimizations in the kernel
> >>side become desirable.
> >>
> >
> >Yes, lockstat only provides the lock usage statistics of
> >entire of the system. perf lock will be able to provide the partial
> >information of specified term, or the degree of dependency
> >between locks.
> >
>
> For trial, I created new tracepoint for rwsem and tested.
> Names of events are rwsem_{acquire, contended, acquired, release},
> their meanings are similar to lock_{...}.
>
> I traced perf bench sched messaging and result was,
>
> mitake@x201i:~/linux/.../tools/perf% ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 1.252 [sec]
> mitake@x201i:~/linux/.../tools/perf% sudo ./perf record -R -m 1024
> -c 1 -e rwsem:rwsem_acquire -e
> rwsem:rwsem_release,rwsem:rwsem_contended,rwsem:rwsem_acquired
> ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 1.332 [sec]
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 13.495 MB perf.data (~589597 samples) ]
>
> raw execution of sched messaging was 1.252 sec, and traced version
> was 1.332 sec. This overhead is far smaller than the overhead of
> current lock tracepoints.

Probably because rwsem are only a small bunch of locks among all others.
If you were to trace only spinlocks, I bet you'd find a significant
overhead pretty close to a wide lock tracing.

> I think that it is possible to write some meaningful tools
> like reader/writer ratio measuring. If something can be written,
> I'll post it.

Consider the situation from another angle: do you think that a lock
profiling on top of lock types is a kind of workflow that will be
used?

The primary kind of workflow I have in mind for lock tracing is:

1) Let's look at the big picture, trace all locks and find those
that seem to be an issue (too much waiting time, too much
acquire time, etc...).

2) Pick one we are interested in and dig into details

But I can't figure out any common worklow that would be based
on mutex only tracing, or rwsem only tracing.
Or actually I can imagine such worklow. Every kind of lock
type have their own scale of latencies so it's interesting
to group the analysis per family. But I rather see
that as a secondary worklow. Once we'll have more finegrained
analysis on the tools for example, like comparison between
read and write latencies on some rwsems and so.

So once we have some such finegrained and useful features in the
tooling side, then justifying such change in the kernel is going
to be much more uncontroversial.

2011-02-25 17:10:56

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年02月25日 01:50, Frederic Weisbecker wrote:
> On Fri, Feb 25, 2011 at 12:46:40AM +0900, Hitoshi Mitake wrote:
>> On 2011年02月23日 13:17, Hitoshi Mitake wrote:
>>> On 2011年02月23日 03:22, Frederic Weisbecker wrote:
>>>> On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
>>>>> On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
>>>>>> How do you think about it?
>>>>>
>>>>> Most of the lock code (esp the spinlock stuff) is already way over the
>>>>> threshold of sanity, adding to that for some dubious reasons doesn't
>>>>> seem like a good idea.
>>>>>
>>>>> I'm still not at all sure why people want all this lock tracing.
>>>>
>>>> Right, well I can imagine many usecases that could make lock
>>>> tracing bring more value than what lockstat already provides,
>>>> through a tool like perf lock if we enhance it.
>>>>
>>>> We should probably first focus on developing the tooling side
>>>> and make it useful enough that optimizations in the kernel
>>>> side become desirable.
>>>>
>>>
>>> Yes, lockstat only provides the lock usage statistics of
>>> entire of the system. perf lock will be able to provide the partial
>>> information of specified term, or the degree of dependency
>>> between locks.
>>>
>>
>> For trial, I created new tracepoint for rwsem and tested.
>> Names of events are rwsem_{acquire, contended, acquired, release},
>> their meanings are similar to lock_{...}.
>>
>> I traced perf bench sched messaging and result was,
>>
>> mitake@x201i:~/linux/.../tools/perf% ./perf bench sched messaging
>> # Running sched/messaging benchmark...
>> # 20 sender and receiver processes per group
>> # 10 groups == 400 processes run
>>
>> Total time: 1.252 [sec]
>> mitake@x201i:~/linux/.../tools/perf% sudo ./perf record -R -m 1024
>> -c 1 -e rwsem:rwsem_acquire -e
>> rwsem:rwsem_release,rwsem:rwsem_contended,rwsem:rwsem_acquired
>> ./perf bench sched messaging
>> # Running sched/messaging benchmark...
>> # 20 sender and receiver processes per group
>> # 10 groups == 400 processes run
>>
>> Total time: 1.332 [sec]
>> [ perf record: Woken up 4 times to write data ]
>> [ perf record: Captured and wrote 13.495 MB perf.data (~589597 samples) ]
>>
>> raw execution of sched messaging was 1.252 sec, and traced version
>> was 1.332 sec. This overhead is far smaller than the overhead of
>> current lock tracepoints.
>
> Probably because rwsem are only a small bunch of locks among all others.
> If you were to trace only spinlocks, I bet you'd find a significant
> overhead pretty close to a wide lock tracing.
>

Yes, spinlocks and rwlocks must dominate the big part of the overhead
of lock tracing.

>> I think that it is possible to write some meaningful tools
>> like reader/writer ratio measuring. If something can be written,
>> I'll post it.
>
> Consider the situation from another angle: do you think that a lock
> profiling on top of lock types is a kind of workflow that will be
> used?
>
> The primary kind of workflow I have in mind for lock tracing is:
>
> 1) Let's look at the big picture, trace all locks and find those
> that seem to be an issue (too much waiting time, too much
> acquire time, etc...).
>
> 2) Pick one we are interested in and dig into details
>
> But I can't figure out any common worklow that would be based
> on mutex only tracing, or rwsem only tracing.
> Or actually I can imagine such worklow. Every kind of lock
> type have their own scale of latencies so it's interesting
> to group the analysis per family. But I rather see
> that as a secondary worklow. Once we'll have more finegrained
> analysis on the tools for example, like comparison between
> read and write latencies on some rwsems and so.
>
> So once we have some such finegrained and useful features in the
> tooling side, then justifying such change in the kernel is going
> to be much more uncontroversial.
>

It seems that I was too preprocessed with the method and
forgot the purpose...

Maybe the things like simple lockstat visualizer or
special diff between two lockstat snapshots are
useful for the first looking at big picture.
I feel that they have worth to write and test.

2011-02-28 15:01:06

by Hitoshi Mitake

[permalink] [raw]
Subject: [PATCH] lockstat: export data in python expression

Hi,

I'm writing the tool for parsing and arranging the data of /proc/lock_stat
in python. But I soon found that lock_stat is hard to parse, so I wrote
this patch to let lock_stat output its information in python expression.

This patch implements the ioctl() for lockstat. via this ioctl(), readers can
change the format of printing. If it is not called lockstat prints information
in traditional style. If it is called and specified cmd as 0 and arg as 1,
readers can get the information in python expression.

The big benefit of printing in pytohn expression is that it can be parsed by
eval() of python.

Sample of output:
{ 'version': 0.3,} # header

{ 'name': 'pm_qos_lock', 'writer': [ 3572153 , 3573984, 0.11, 372.87, 3333033.22, 23292427 , 31255420, 0.10, 291.25, 18881384.46, ], 'reader': None, 'contention': [[3573984, 0xffffffff8106e4ea, 'pm_qos_request+0x1a/0x70'], ], 'contending': [[3573984, 0xffffffff8106e4ea, 'pm_qos_request+0x1a/0x70'], ] }

{ 'name': 'xtime_lock', 'writer': [ 1798546 , 1798795, 0.27, 10.92, 4252980.74, 7116071 , 14729524, 0.00, 1000.00, 30979660.87, ], 'reader': None, 'contention': [[1798795, 0xffffffff81077dec, 'tick_do_update_jiffies64+0x3c/0xd0'], ], 'contending': [[1798795, 0xffffffff81077dec, 'tick_do_update_jiffies64+0x3c/0xd0'], ] }

...

And,

f = open('/proc/lock_stat')
fcntl.ioctl(f.fileno(), 0, 1) # specify format

header = eval(f.readline()) # the first line is header

for line in f.readlines():
lock = eval(line) # eval each line

this is the everything of parsing, really easy. I think exporting data in
eval()able style is efficient way in procfs and sysfs. How do you think?

BTW, this patch is too python centric. Exporting in more neutral style like
S-expression might be good.

Signed-off-by: Hitoshi Mitake <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Steven Rostedt <[email protected]>
---
kernel/lockdep_proc.c | 213 ++++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 185 insertions(+), 28 deletions(-)

diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c
index 1969d2f..4474014 100644
--- a/kernel/lockdep_proc.c
+++ b/kernel/lockdep_proc.c
@@ -374,7 +374,14 @@ struct lock_stat_data {
struct lock_class_stats stats;
};

+enum lock_stat_format {
+ lock_stat_format_normal = 0,
+ lock_stat_format_python,
+ lock_stat_format_max = lock_stat_format_python,
+};
+
struct lock_stat_seq {
+ enum lock_stat_format format;
struct lock_stat_data *iter_end;
struct lock_stat_data stats[MAX_LOCKDEP_KEYS];
};
@@ -417,14 +424,24 @@ static void snprint_time(char *buf, size_t bufsiz, s64 nr)
static void seq_time(struct seq_file *m, s64 time)
{
char num[15];
+ enum lock_stat_format f =
+ ((struct lock_stat_seq *)m->private)->format;

snprint_time(num, sizeof(num), time);
seq_printf(m, " %14s", num);
+ if (f == lock_stat_format_python)
+ seq_printf(m, ", ");
}

static void seq_lock_time(struct seq_file *m, struct lock_time *lt)
{
+ enum lock_stat_format f =
+ ((struct lock_stat_seq *)m->private)->format;
+
seq_printf(m, "%14lu", lt->nr);
+ if (f == lock_stat_format_python)
+ seq_printf(m, ", ");
+
seq_time(m, lt->min);
seq_time(m, lt->max);
seq_time(m, lt->total);
@@ -436,6 +453,8 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
struct lock_class *class;
struct lock_class_stats *stats;
int i, namelen;
+ enum lock_stat_format f =
+ ((struct lock_stat_seq *)m->private)->format;

class = data->class;
stats = &data->stats;
@@ -465,80 +484,177 @@ static void seq_stats(struct seq_file *m, struct lock_stat_data *data)
namelen += 2;
}

+ if (f == lock_stat_format_python) {
+ seq_printf(m, "{ 'name': '%s', ", name);
+ seq_printf(m, "'writer': ");
+ }
+
if (stats->write_holdtime.nr) {
- if (stats->read_holdtime.nr)
- seq_printf(m, "%38s-W:", name);
- else
- seq_printf(m, "%40s:", name);
+ if (f == lock_stat_format_python)
+ seq_printf(m, "[ ");
+
+ if (f == lock_stat_format_normal) {
+ if (stats->read_holdtime.nr)
+ seq_printf(m, "%38s-W:", name);
+ else
+ seq_printf(m, "%40s:", name);
+ }

seq_printf(m, "%14lu ", stats->bounces[bounce_contended_write]);
+ if (f == lock_stat_format_python)
+ seq_printf(m, ",");
seq_lock_time(m, &stats->write_waittime);
+
seq_printf(m, " %14lu ", stats->bounces[bounce_acquired_write]);
+ if (f == lock_stat_format_python)
+ seq_printf(m, ",");
seq_lock_time(m, &stats->write_holdtime);
- seq_puts(m, "\n");
+ if (f == lock_stat_format_python)
+ seq_printf(m, "]");
+
+ if (f == lock_stat_format_normal)
+ seq_puts(m, "\n");
+ } else {
+ if (f == lock_stat_format_python)
+ seq_printf(m, "None");
}

+ if (f == lock_stat_format_python)
+ seq_printf(m, ", 'reader': ");
+
if (stats->read_holdtime.nr) {
- seq_printf(m, "%38s-R:", name);
+ if (f == lock_stat_format_python)
+ seq_printf(m, "[ ");
+
+ if (f == lock_stat_format_normal)
+ seq_printf(m, "%38s-R:", name);
+
seq_printf(m, "%14lu ", stats->bounces[bounce_contended_read]);
+ if (f == lock_stat_format_python)
+ seq_printf(m, ",");
seq_lock_time(m, &stats->read_waittime);
+
seq_printf(m, " %14lu ", stats->bounces[bounce_acquired_read]);
+ if (f == lock_stat_format_python)
+ seq_printf(m, ",");
seq_lock_time(m, &stats->read_holdtime);
- seq_puts(m, "\n");
+ if (f == lock_stat_format_python)
+ seq_printf(m, "]");
+
+ if (f == lock_stat_format_normal)
+ seq_puts(m, "\n");
+ } else {
+ if (f == lock_stat_format_python)
+ seq_printf(m, "None");
}

- if (stats->read_waittime.nr + stats->write_waittime.nr == 0)
+ if (stats->read_waittime.nr + stats->write_waittime.nr == 0) {
+ if (f == lock_stat_format_python)
+ seq_printf(m, "}\n");
return;
+ }

if (stats->read_holdtime.nr)
namelen += 2;

+ if (f == lock_stat_format_python)
+ seq_printf(m, ", 'contention': [");
for (i = 0; i < LOCKSTAT_POINTS; i++) {
char ip[32];

if (class->contention_point[i] == 0)
break;

- if (!i)
+ if (!i && f == lock_stat_format_normal)
seq_line(m, '-', 40-namelen, namelen);

- snprintf(ip, sizeof(ip), "[<%p>]",
+ switch (f) {
+ case lock_stat_format_python:
+ seq_printf(m, "[%lu, 0x%p, '%pS'], ",
+ stats->contention_point[i],
+ (void *)class->contention_point[i],
(void *)class->contention_point[i]);
- seq_printf(m, "%40s %14lu %29s %pS\n",
- name, stats->contention_point[i],
- ip, (void *)class->contention_point[i]);
+ break;
+ default:
+ snprintf(ip, sizeof(ip), "[<%p>]",
+ (void *)class->contention_point[i]);
+ seq_printf(m, "%40s %14lu %29s %pS\n",
+ name, stats->contention_point[i],
+ ip, (void *)class->contention_point[i]);
+ break;
+ }
}
+
+ if (f == lock_stat_format_python)
+ seq_printf(m, "], 'contending': [");
+
for (i = 0; i < LOCKSTAT_POINTS; i++) {
char ip[32];

if (class->contending_point[i] == 0)
break;

- if (!i)
+ if (!i && f == lock_stat_format_normal)
seq_line(m, '-', 40-namelen, namelen);

- snprintf(ip, sizeof(ip), "[<%p>]",
+ switch (f) {
+ case lock_stat_format_python:
+ seq_printf(m, "[%lu, 0x%p, '%pS'], ",
+ stats->contention_point[i],
+ (void *)class->contending_point[i],
+ (void *)class->contending_point[i]);
+ break;
+ default:
+ snprintf(ip, sizeof(ip), "[<%p>]",
(void *)class->contending_point[i]);
- seq_printf(m, "%40s %14lu %29s %pS\n",
- name, stats->contending_point[i],
- ip, (void *)class->contending_point[i]);
+ seq_printf(m, "%40s %14lu %29s %pS\n",
+ name, stats->contending_point[i],
+ ip, (void *)class->contending_point[i]);
+ break;
+ }
}
if (i) {
- seq_puts(m, "\n");
- seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1));
- seq_puts(m, "\n");
+ if (f == lock_stat_format_normal) {
+ seq_puts(m, "\n");
+ seq_line(m, '.', 0, 40 + 1 + 10 * (14 + 1));
+ seq_puts(m, "\n");
+ }
}
+ if (f == lock_stat_format_python)
+ seq_printf(m, "] }\n");
}

static void seq_header(struct seq_file *m)
{
- seq_printf(m, "lock_stat version 0.3\n");
+ enum lock_stat_format format =
+ ((struct lock_stat_seq *)m->private)->format;
+
+ switch (format) {
+ case lock_stat_format_python:
+ seq_printf(m, "{ 'version': 0.3,");
+ break;
+ default: /* normal */
+ seq_printf(m, "lock_stat version 0.3\n");
+ break;
+ }
+
+ if (unlikely(!debug_locks)) {
+ const char *warning = "*WARNING* lock debugging disabled!!"
+ " - possibly due to a lockdep warning\n";

- if (unlikely(!debug_locks))
- seq_printf(m, "*WARNING* lock debugging disabled!! - possibly due to a lockdep warning\n");
+ switch (format) {
+ case lock_stat_format_python:
+ seq_printf(m, "'warning': '%s',", warning);
+ break;
+ default: /* normal */
+ seq_printf(m, "%s\n", warning);
+ break;
+ }
+ }

- seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
- seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s "
+ if (format == lock_stat_format_normal) {
+ seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
+ seq_printf(m, "%40s %14s %14s %14s %14s %14s %14s %14s %14s "
"%14s %14s\n",
"class name",
"con-bounces",
@@ -551,8 +667,17 @@ static void seq_header(struct seq_file *m)
"holdtime-min",
"holdtime-max",
"holdtime-total");
- seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
- seq_printf(m, "\n");
+ seq_line(m, '-', 0, 40 + 1 + 10 * (14 + 1));
+ }
+
+ switch (format) {
+ case lock_stat_format_python:
+ seq_printf(m, "}\n"); /* end of header dictionary */
+ break;
+ default: /* normal */
+ seq_printf(m, "\n");
+ break;
+ }
}

static void *ls_start(struct seq_file *m, loff_t *pos)
@@ -606,6 +731,8 @@ static int lock_stat_open(struct inode *inode, struct file *file)
if (!data)
return -ENOMEM;

+ data->format = lock_stat_format_normal;
+
res = seq_open(file, &lockstat_ops);
if (!res) {
struct lock_stat_data *iter = data->stats;
@@ -656,12 +783,42 @@ static int lock_stat_release(struct inode *inode, struct file *file)
return seq_release(inode, file);
}

+static long lock_stat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ struct lock_stat_seq *data =
+ ((struct seq_file *)file->private_data)->private;
+ enum lock_stat_format new_format;
+
+ /*
+ * currently, this ioctl doesn't provide any definition for cmd,
+ * because main user of this ioctl is python script.
+ */
+
+ if (cmd == 0) {
+ new_format = arg;
+ if (new_format < 0 || lock_stat_format_max < new_format) {
+ printk(KERN_INFO "lock_stat_ioctl():"
+ " invalid format: %d\n", new_format);
+ return -EINVAL;
+ }
+
+ data->format = arg;
+ } else {
+ printk(KERN_ERR "lock_stat_ioctl(): invalid cmd, %u\n", cmd);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static const struct file_operations proc_lock_stat_operations = {
.open = lock_stat_open,
.write = lock_stat_write,
.read = seq_read,
.llseek = seq_lseek,
.release = lock_stat_release,
+ .unlocked_ioctl = lock_stat_ioctl,
};
#endif /* CONFIG_LOCK_STAT */

--
1.7.3.3

2011-02-28 18:07:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] lockstat: export data in python expression

On Tue, 2011-03-01 at 00:00 +0900, Hitoshi Mitake wrote:
> I'm writing the tool for parsing and arranging the data of /proc/lock_stat
> in python. But I soon found that lock_stat is hard to parse, so I wrote
> this patch to let lock_stat output its information in python expression.

_IF_ we're going to do something like this, I'd much prefer a sane
output format over something that's easily parseble by just one thing.

2011-02-28 23:49:21

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] lockstat: export data in python expression

On 2011年03月01日 03:07, Peter Zijlstra wrote:
> On Tue, 2011-03-01 at 00:00 +0900, Hitoshi Mitake wrote:
>> I'm writing the tool for parsing and arranging the data of /proc/lock_stat
>> in python. But I soon found that lock_stat is hard to parse, so I wrote
>> this patch to let lock_stat output its information in python expression.
>
> _IF_ we're going to do something like this, I'd much prefer a sane
> output format over something that's easily parseble by just one thing.
>
>

Does "by just one thing" mean that the ioctl() for format setting
should not be allowed? I have to admit that this patch adds lots of
ugly if and switch statements. If I have to choose just one style
in order to forbid increasing expressions, I'd like to choose
S-expression because it is easy to parse by any language.

2011-03-01 14:55:46

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Sat, Feb 26, 2011 at 02:10:30AM +0900, Hitoshi Mitake wrote:
> It seems that I was too preprocessed with the method and
> forgot the purpose...
>
> Maybe the things like simple lockstat visualizer or
> special diff between two lockstat snapshots are
> useful for the first looking at big picture.
> I feel that they have worth to write and test.

Indeed they sound like good ideas. Being able to do a diff
on locks profiles would be useful to compare two changes on
the kernel.

2011-03-04 09:41:43

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 03/01/11 23:55, Frederic Weisbecker wrote:
> On Sat, Feb 26, 2011 at 02:10:30AM +0900, Hitoshi Mitake wrote:
>> It seems that I was too preprocessed with the method and
>> forgot the purpose...
>>
>> Maybe the things like simple lockstat visualizer or
>> special diff between two lockstat snapshots are
>> useful for the first looking at big picture.
>> I feel that they have worth to write and test.
>
> Indeed they sound like good ideas. Being able to do a diff
> on locks profiles would be useful to compare two changes on
> the kernel.
>

BTW, how do you think about the idea of exporting data in
python (or other neutral) expression from procfs? I feel it is a
good idea. Communicating with unified format between user space and
kernel space will reduce lots of parsing overhead. Is this too
aggressive or insane?

2011-03-04 13:56:46

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Fri, Mar 04, 2011 at 06:41:53PM +0900, Hitoshi Mitake wrote:
> On 03/01/11 23:55, Frederic Weisbecker wrote:
> >On Sat, Feb 26, 2011 at 02:10:30AM +0900, Hitoshi Mitake wrote:
> >>It seems that I was too preprocessed with the method and
> >>forgot the purpose...
> >>
> >>Maybe the things like simple lockstat visualizer or
> >>special diff between two lockstat snapshots are
> >>useful for the first looking at big picture.
> >>I feel that they have worth to write and test.
> >
> >Indeed they sound like good ideas. Being able to do a diff
> >on locks profiles would be useful to compare two changes on
> >the kernel.
> >
>
> BTW, how do you think about the idea of exporting data in
> python (or other neutral) expression from procfs? I feel it is a
> good idea. Communicating with unified format between user space and
> kernel space will reduce lots of parsing overhead. Is this too
> aggressive or insane?

Well, I'm not sure about the goal of parsing that lockstat file.

lockstat is a global measurement since the boot. One of the point with
perf is that you can measure the same things than lockstat (and more)
on a delimited context and time slice: a process or a cpu for a given time.

So the right source is more on perf.data resulting in a precise measurement
than in a global /proc/, right?

2011-03-04 13:58:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Fri, 2011-03-04 at 14:56 +0100, Frederic Weisbecker wrote:
> lockstat is a global measurement since the boot.

You can reset lockstat at any given time.

2011-03-04 14:09:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] lockstat: export data in python expression

On Mon, 2011-02-28 at 19:07 +0100, Peter Zijlstra wrote:
> On Tue, 2011-03-01 at 00:00 +0900, Hitoshi Mitake wrote:
> > I'm writing the tool for parsing and arranging the data of /proc/lock_stat
> > in python. But I soon found that lock_stat is hard to parse, so I wrote
> > this patch to let lock_stat output its information in python expression.
>
> _IF_ we're going to do something like this, I'd much prefer a sane
> output format over something that's easily parseble by just one thing.

Right, I do not think the kernel should be exporting information in a
python (or any other) language.

If it is very difficult to parse, then I doubt anything is currently
parsing it, so it should be fine to change the format. Lets just change
it to something that is much easier to to parse and stick with that.

-- Steve

2011-03-04 14:21:25

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Fri, Mar 04, 2011 at 02:58:19PM +0100, Peter Zijlstra wrote:
> On Fri, 2011-03-04 at 14:56 +0100, Frederic Weisbecker wrote:
> > lockstat is a global measurement since the boot.
>
> You can reset lockstat at any given time.

Event though, that's still a global profiling. You can't have a per
process or thread granularity.

Or even more precise context for some future feature in perf that
I have in mind and would like to implement soon, like counting/sampling
an event only between two others. More exactly having two lists for
each event:

* activate
* deactivate

Those on the first list activate the target event when they overflow. (->start() )
Those on the second list deactivate .... (->stop() )

With events in activate and deactivate in the same context than the target,
locking and permissions should be kept simple. Locking especially shouldn't be
needed in the fast path. Then if that's needed we could think about a cross
context things later.

Plus an attr->start_state that decides if the first ->add() made is made
with PERF_EF_STAT or not. (We then need to keep track of some activated/deactivated
state across schedules).

That's in fact the exclude_irq and exclude_softirq idea extended to any kind
of existing event.

2011-03-04 14:37:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Fri, 2011-03-04 at 18:41 +0900, Hitoshi Mitake wrote:

> BTW, how do you think about the idea of exporting data in
> python (or other neutral) expression from procfs? I feel it is a
> good idea. Communicating with unified format between user space and
> kernel space will reduce lots of parsing overhead. Is this too
> aggressive or insane?

As I mentioned in another email, I have no problem with an easy to parse
file. But I will aggressively NAK any "python" or other scripting
language. I'm sure I would get the same response if I were to have the
kernel outputting perl language ;)

I would be OK if we have two files similar to stat and status, where one
format is human readable, the other is for parsing.

Thus, the only acceptable language that should come out of the kernel is
English.

-- Steve

2011-03-04 14:41:35

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On Fri, Mar 04, 2011 at 09:37:18AM -0500, Steven Rostedt wrote:
> On Fri, 2011-03-04 at 18:41 +0900, Hitoshi Mitake wrote:
>
> > BTW, how do you think about the idea of exporting data in
> > python (or other neutral) expression from procfs? I feel it is a
> > good idea. Communicating with unified format between user space and
> > kernel space will reduce lots of parsing overhead. Is this too
> > aggressive or insane?
>
> As I mentioned in another email, I have no problem with an easy to parse
> file. But I will aggressively NAK any "python" or other scripting
> language. I'm sure I would get the same response if I were to have the
> kernel outputting perl language ;)

Same for me.

But even before talking about that, I wonder if doing this is actually needed.

2011-03-05 17:20:16

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年03月04日 23:41, Frederic Weisbecker wrote:
> On Fri, Mar 04, 2011 at 09:37:18AM -0500, Steven Rostedt wrote:
>> On Fri, 2011-03-04 at 18:41 +0900, Hitoshi Mitake wrote:
>>
>>> BTW, how do you think about the idea of exporting data in
>>> python (or other neutral) expression from procfs? I feel it is a
>>> good idea. Communicating with unified format between user space and
>>> kernel space will reduce lots of parsing overhead. Is this too
>>> aggressive or insane?
>>
>> As I mentioned in another email, I have no problem with an easy to parse
>> file. But I will aggressively NAK any "python" or other scripting
>> language. I'm sure I would get the same response if I were to have the
>> kernel outputting perl language ;)
>
> Same for me.
>
> But even before talking about that, I wonder if doing this is actually needed.
>

With analyzing lock_stat, grasping the rough trend of lock usage might
be possible. I'm imagining the tool like top, e.g. read the data from
lock_stat periodically, analyze the difference between unit time, and
print the result.

2011-03-05 17:14:32

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年03月04日 23:37, Steven Rostedt wrote:
> On Fri, 2011-03-04 at 18:41 +0900, Hitoshi Mitake wrote:
>
>> BTW, how do you think about the idea of exporting data in
>> python (or other neutral) expression from procfs? I feel it is a
>> good idea. Communicating with unified format between user space and
>> kernel space will reduce lots of parsing overhead. Is this too
>> aggressive or insane?
>
> As I mentioned in another email, I have no problem with an easy to parse
> file. But I will aggressively NAK any "python" or other scripting
> language. I'm sure I would get the same response if I were to have the
> kernel outputting perl language ;)
>
> I would be OK if we have two files similar to stat and status, where one
> format is human readable, the other is for parsing.
>
> Thus, the only acceptable language that should come out of the kernel is
> English.
>
> -- Steve
>
>
>

OK. As you say, the big confusion might be occur if there are lots of
format like python or perl (and they have several versions).

The reason why I posted the patch is that I like the idea of
/proc/config.gz. Providing the data from kernel in specific syntax
is so smart, so I followed it :)

2011-03-09 16:41:08

by Hitoshi Mitake

[permalink] [raw]
Subject: Re: [PATCH] perf lock: clean the options for perf record

On 2011年03月04日 23:21, Frederic Weisbecker wrote:
> On Fri, Mar 04, 2011 at 02:58:19PM +0100, Peter Zijlstra wrote:
>> On Fri, 2011-03-04 at 14:56 +0100, Frederic Weisbecker wrote:
>>> lockstat is a global measurement since the boot.
>>
>> You can reset lockstat at any given time.
>
> Event though, that's still a global profiling. You can't have a per
> process or thread granularity.
>
> Or even more precise context for some future feature in perf that
> I have in mind and would like to implement soon, like counting/sampling
> an event only between two others. More exactly having two lists for
> each event:
>
> * activate
> * deactivate
>
> Those on the first list activate the target event when they overflow. (->start() )
> Those on the second list deactivate .... (->stop() )
>
> With events in activate and deactivate in the same context than the target,
> locking and permissions should be kept simple. Locking especially shouldn't be
> needed in the fast path. Then if that's needed we could think about a cross
> context things later.
>
> Plus an attr->start_state that decides if the first ->add() made is made
> with PERF_EF_STAT or not. (We then need to keep track of some activated/deactivated
> state across schedules).
>
> That's in fact the exclude_irq and exclude_softirq idea extended to any kind
> of existing event.
>

Do you mean that the event dropping in perf_tp_event_match() should be
extended for filtering hardirq and softirq? If so, it seems good.
I'd like to measure the effectiveness of it.