The following commit:
b52956c perf tools: Allow multiple threads or processes in record, stat, top
introduced a bug in the thread_map code which caused
perf record -a to not setup system-wide monitoring properly.
$ taskset -c 1 noploop 1000 &
$ perf record -a -C 1 sleep 10
$ perf report -D | tail -20
cycles stats:
TOTAL events: 4413
MMAP events: 4025
COMM events: 340
SAMPLE events: 48
Here I was expecting about 10,000 samples and not 48.
In system-wide mode, the PID passed to perf_event_open()
must be -1 and it was 0. That caused the kernel to setup
a per-process event on PID:0. Consequently, the number
of samples captured does not correspond to the requested
measurement.
The following one-liner fixes the problem for me with or
without -C.
I would also suggest to change the malloc() to something
that matches the struct definition. thread_map->map[] is
declared as int map[] and not pid_t map[]. If map[] can
only contain pids, then change the struct definition.
Signed-off-by: Stephane Eranian <[email protected]>
---
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index e15983c..84d9bd78 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -229,7 +229,7 @@ static struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
if (!tid_str) {
threads = malloc(sizeof(*threads) + sizeof(pid_t));
if (threads != NULL) {
- threads->map[1] = -1;
+ threads->map[0] = -1;
threads->nr = 1;
}
return threads;
On 2/21/12 7:54 AM, Stephane Eranian wrote:
>
> The following commit:
> b52956c perf tools: Allow multiple threads or processes in record, stat, top
>
> introduced a bug in the thread_map code which caused
> perf record -a to not setup system-wide monitoring properly.
>
> $ taskset -c 1 noploop 1000&
> $ perf record -a -C 1 sleep 10
> $ perf report -D | tail -20
> cycles stats:
> TOTAL events: 4413
> MMAP events: 4025
> COMM events: 340
> SAMPLE events: 48
>
> Here I was expecting about 10,000 samples and not 48.
>
> In system-wide mode, the PID passed to perf_event_open()
> must be -1 and it was 0. That caused the kernel to setup
> a per-process event on PID:0. Consequently, the number
> of samples captured does not correspond to the requested
> measurement.
>
> The following one-liner fixes the problem for me with or
> without -C.
>
> I would also suggest to change the malloc() to something
> that matches the struct definition. thread_map->map[] is
> declared as int map[] and not pid_t map[]. If map[] can
> only contain pids, then change the struct definition.
>
> Signed-off-by: Stephane Eranian<[email protected]>
> ---
>
> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
> index e15983c..84d9bd78 100644
> --- a/tools/perf/util/thread_map.c
> +++ b/tools/perf/util/thread_map.c
> @@ -229,7 +229,7 @@ static struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
> if (!tid_str) {
> threads = malloc(sizeof(*threads) + sizeof(pid_t));
> if (threads != NULL) {
> - threads->map[1] = -1;
> + threads->map[0] = -1;
> threads->nr = 1;
> }
> return threads;
Damn. Hope you did not spend much time chasing it down.
Acked-by: David Ahern <[email protected]>
Em Tue, Feb 21, 2012 at 08:31:56AM -0700, David Ahern escreveu:
> On 2/21/12 7:54 AM, Stephane Eranian wrote:
>>
>> The following commit:
>> b52956c perf tools: Allow multiple threads or processes in record, stat, top
>>
>> introduced a bug in the thread_map code which caused
>> perf record -a to not setup system-wide monitoring properly.
>>
>> $ taskset -c 1 noploop 1000&
>> $ perf record -a -C 1 sleep 10
>> $ perf report -D | tail -20
>> cycles stats:
>> TOTAL events: 4413
>> MMAP events: 4025
>> COMM events: 340
>> SAMPLE events: 48
>>
>> Here I was expecting about 10,000 samples and not 48.
>>
>> In system-wide mode, the PID passed to perf_event_open()
>> must be -1 and it was 0. That caused the kernel to setup
>> a per-process event on PID:0. Consequently, the number
>> of samples captured does not correspond to the requested
>> measurement.
>>
>> The following one-liner fixes the problem for me with or
>> without -C.
>> I would also suggest to change the malloc() to something
>> that matches the struct definition. thread_map->map[] is
>> declared as int map[] and not pid_t map[]. If map[] can
>> only contain pids, then change the struct definition.
Stephane,
Feel free to submit a patch :-)
>> Signed-off-by: Stephane Eranian<[email protected]>
>>
>> diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
>> index e15983c..84d9bd78 100644
>> --- a/tools/perf/util/thread_map.c
>> +++ b/tools/perf/util/thread_map.c
>> @@ -229,7 +229,7 @@ static struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
>> if (!tid_str) {
>> threads = malloc(sizeof(*threads) + sizeof(pid_t));
>> if (threads != NULL) {
>> - threads->map[1] = -1;
>> + threads->map[0] = -1;
>> threads->nr = 1;
>> }
>> return threads;
>
> Damn. Hope you did not spend much time chasing it down.
> Acked-by: David Ahern <[email protected]>
Yeah, this one slip thru my visual inspection :-\
Now I'll pay for this sin by adding an entry in 'perf test' to check
that :-)
- Arnaldo
Commit-ID: 6b1bee9035d430c4b4f586df6df4b3f840e89b5b
Gitweb: http://git.kernel.org/tip/6b1bee9035d430c4b4f586df6df4b3f840e89b5b
Author: Stephane Eranian <[email protected]>
AuthorDate: Tue, 21 Feb 2012 15:54:25 +0100
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 21 Feb 2012 15:05:43 -0200
perf tools: fix broken perf record -a mode
The following commit:
b52956c perf tools: Allow multiple threads or processes in record, stat, top
introduced a bug in the thread_map code which caused perf record -a to
not setup system-wide monitoring properly.
$ taskset -c 1 noploop 1000 &
$ perf record -a -C 1 sleep 10
$ perf report -D | tail -20
cycles stats:
TOTAL events: 4413
MMAP events: 4025
COMM events: 340
SAMPLE events: 48
Here I was expecting about 10,000 samples and not 48.
In system-wide mode, the PID passed to perf_event_open() must be -1 and
it was 0. That caused the kernel to setup a per-process event on PID:0.
Consequently, the number of samples captured does not correspond to the
requested measurement.
The following one-liner fixes the problem for me with or without -C.
I would also suggest to change the malloc() to something that matches
the struct definition. thread_map->map[] is declared as int map[] and
not pid_t map[]. If map[] can only contain pids, then change the struct
definition.
Acked-by: David Ahern <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/20120221145424.GA6757@quad
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/thread_map.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index e15983c..84d9bd78 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -229,7 +229,7 @@ static struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
if (!tid_str) {
threads = malloc(sizeof(*threads) + sizeof(pid_t));
if (threads != NULL) {
- threads->map[1] = -1;
+ threads->map[0] = -1;
threads->nr = 1;
}
return threads;