764e16a changed perf-record to create events disabled by default and
enable them once perf initializations are done. This setting was dropped
by 0f82ebc. Now perf events are once again generated during perf's
initialization phase (e.g., generating maps).
As an example, perf opens a lot of files at startup. Unpatched:
perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.087 MB /tmp/perf.data (~3798 samples) ]
Using perf-script to look at the samples shows the perf command generating
563 of the 566 total events.
Patched:
perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB /tmp/perf.data (~1206 samples) ]
Using perf-script to look at the samples does not show perf command.
Signed-off-by: David Ahern <[email protected]>
---
tools/perf/util/evsel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 21eaab2..6710cfe 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -70,6 +70,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
struct perf_event_attr *attr = &evsel->attr;
int track = !evsel->idx; /* only the first counter needs these */
+ attr->disabled = 1;
attr->sample_id_all = opts->sample_id_all_missing ? 0 : 1;
attr->inherit = !opts->no_inherit;
attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
@@ -138,7 +139,6 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
if (perf_target__none(&opts->target) &&
(!opts->group || evsel == first)) {
- attr->disabled = 1;
attr->enable_on_exec = 1;
}
}
--
1.7.10.1
Hi,
On Sun, 13 May 2012 22:01:28 -0600, David Ahern wrote:
> 764e16a changed perf-record to create events disabled by default and
> enable them once perf initializations are done. This setting was dropped
> by 0f82ebc. Now perf events are once again generated during perf's
> initialization phase (e.g., generating maps).
>
> As an example, perf opens a lot of files at startup. Unpatched:
>
> perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.087 MB /tmp/perf.data (~3798 samples) ]
>
> Using perf-script to look at the samples shows the perf command generating
> 563 of the 566 total events.
>
> Patched:
>
> perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.028 MB /tmp/perf.data (~1206 samples) ]
>
> Using perf-script to look at the samples does not show perf command.
>
> Signed-off-by: David Ahern <[email protected]>
> ---
> tools/perf/util/evsel.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 21eaab2..6710cfe 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -70,6 +70,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
> struct perf_event_attr *attr = &evsel->attr;
> int track = !evsel->idx; /* only the first counter needs these */
>
> + attr->disabled = 1;
> attr->sample_id_all = opts->sample_id_all_missing ? 0 : 1;
> attr->inherit = !opts->no_inherit;
> attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
> @@ -138,7 +139,6 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
>
> if (perf_target__none(&opts->target) &&
> (!opts->group || evsel == first)) {
> - attr->disabled = 1;
> attr->enable_on_exec = 1;
> }
> }
A problem I see is that it'll break group handling again:
$ ./perf stat -g sleep 1
Performance counter stats for 'sleep 1':
<not counted> task-clock
<not counted> context-switches
<not counted> CPU-migrations
<not counted> page-faults
<not counted> cycles
<not counted> stalled-cycles-frontend
<not counted> stalled-cycles-backend
<not counted> instructions
<not counted> branches
<not counted> branch-misses
1.000868932 seconds time elapsed
So I suggest changing perf_target__none() check to a proper one
(perf_target__no_cpu? - the name might be changed soon) for your
purpose.
Thanks,
Namhyung
On 5/14/12 1:40 AM, Namhyung Kim wrote:
> Hi,
>
> On Sun, 13 May 2012 22:01:28 -0600, David Ahern wrote:
>> 764e16a changed perf-record to create events disabled by default and
>> enable them once perf initializations are done. This setting was dropped
>> by 0f82ebc. Now perf events are once again generated during perf's
>> initialization phase (e.g., generating maps).
>>
>> As an example, perf opens a lot of files at startup. Unpatched:
>>
>> perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.087 MB /tmp/perf.data (~3798 samples) ]
>>
>> Using perf-script to look at the samples shows the perf command generating
>> 563 of the 566 total events.
>>
>> Patched:
>>
>> perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.028 MB /tmp/perf.data (~1206 samples) ]
>>
>> Using perf-script to look at the samples does not show perf command.
>>
>> Signed-off-by: David Ahern<[email protected]>
>> ---
>> tools/perf/util/evsel.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index 21eaab2..6710cfe 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -70,6 +70,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
>> struct perf_event_attr *attr =&evsel->attr;
>> int track = !evsel->idx; /* only the first counter needs these */
>>
>> + attr->disabled = 1;
>> attr->sample_id_all = opts->sample_id_all_missing ? 0 : 1;
>> attr->inherit = !opts->no_inherit;
>> attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
>> @@ -138,7 +139,6 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
>>
>> if (perf_target__none(&opts->target)&&
>> (!opts->group || evsel == first)) {
>> - attr->disabled = 1;
>> attr->enable_on_exec = 1;
>> }
>> }
>
> A problem I see is that it'll break group handling again:
>
> $ ./perf stat -g sleep 1
>
> Performance counter stats for 'sleep 1':
>
> <not counted> task-clock
> <not counted> context-switches
> <not counted> CPU-migrations
> <not counted> page-faults
> <not counted> cycles
> <not counted> stalled-cycles-frontend
> <not counted> stalled-cycles-backend
> <not counted> instructions
> <not counted> branches
> <not counted> branch-misses
>
> 1.000868932 seconds time elapsed
>
> So I suggest changing perf_target__none() check to a proper one
> (perf_target__no_cpu? - the name might be changed soon) for your
> purpose.
>
> Thanks,
> Namhyung
Something else is wrong then. I tested that command (saw your patch in
the history) and it worked for me. Also, this code path does not affect
perf-stat -- it touches perf-record and perf-test only.
David
Em Sun, May 13, 2012 at 10:01:28PM -0600, David Ahern escreveu:
> 764e16a changed perf-record to create events disabled by default and
> enable them once perf initializations are done. This setting was dropped
> by 0f82ebc. Now perf events are once again generated during perf's
> initialization phase (e.g., generating maps).
Gack, I really need to finally read about that autotest stuff and we
should start writing more and more regression tests to avoid this kind
of stuff.
- Arnaldo
On 5/14/12 7:09 AM, David Ahern wrote:
>> A problem I see is that it'll break group handling again:
>>
>> $ ./perf stat -g sleep 1
>>
>> Performance counter stats for 'sleep 1':
>>
>> <not counted> task-clock
>> <not counted> context-switches
>> <not counted> CPU-migrations
>> <not counted> page-faults
>> <not counted> cycles
>> <not counted> stalled-cycles-frontend
>> <not counted> stalled-cycles-backend
>> <not counted> instructions
>> <not counted> branches
>> <not counted> branch-misses
>>
>> 1.000868932 seconds time elapsed
>>
>> So I suggest changing perf_target__none() check to a proper one
>> (perf_target__no_cpu? - the name might be changed soon) for your
>> purpose.
>>
>> Thanks,
>> Namhyung
>
> Something else is wrong then. I tested that command (saw your patch in
> the history) and it worked for me. Also, this code path does not affect
> perf-stat -- it touches perf-record and perf-test only.
I think it is something else. I am running latest git tree (3.4.0-rc7).
perf from Linus' tree and acme/core both show:
perf stat -g -- find /usr >/dev/null
Performance counter stats for 'find /usr':
<not counted> task-clock
<not counted> context-switches
<not counted> CPU-migrations
<not counted> page-faults
<not counted> cycles
<not counted> stalled-cycles-frontend
<not counted> stalled-cycles-backend
<not counted> instructions
<not counted> branches
<not counted> branch-misses
0.111976940 seconds time elapsed
(Using find to make sure some work is done as opposed to sleep; openssl
speed also shows the above.)
David
Em Mon, May 14, 2012 at 08:21:02AM -0600, David Ahern escreveu:
> On 5/14/12 7:09 AM, David Ahern wrote:
> >>A problem I see is that it'll break group handling again:
> >>
> >>$ ./perf stat -g sleep 1
> >>
> >>Performance counter stats for 'sleep 1':
> >>
> >><not counted> task-clock
> >><not counted> context-switches
> >><not counted> CPU-migrations
> >><not counted> page-faults
> >><not counted> cycles
> >><not counted> stalled-cycles-frontend
> >><not counted> stalled-cycles-backend
> >><not counted> instructions
> >><not counted> branches
> >><not counted> branch-misses
> >>
> >>1.000868932 seconds time elapsed
> >>
> >>So I suggest changing perf_target__none() check to a proper one
> >>(perf_target__no_cpu? - the name might be changed soon) for your
> >>purpose.
> >
> >Something else is wrong then. I tested that command (saw your patch in
> >the history) and it worked for me. Also, this code path does not affect
> >perf-stat -- it touches perf-record and perf-test only.
>
> I think it is something else. I am running latest git tree
> (3.4.0-rc7). perf from Linus' tree and acme/core both show:
>
> perf stat -g -- find /usr >/dev/null
>
> Performance counter stats for 'find /usr':
>
> <not counted> task-clock
> <not counted> context-switches
> <not counted> CPU-migrations
> <not counted> page-faults
> <not counted> cycles
> <not counted> stalled-cycles-frontend
> <not counted> stalled-cycles-backend
> <not counted> instructions
> <not counted> branches
> <not counted> branch-misses
>
> 0.111976940 seconds time elapsed
>
> (Using find to make sure some work is done as opposed to sleep;
> openssl speed also shows the above.)
[acme@sandy ~]$ perf stat -g -- find /usr >/dev/null
find: `/usr/lib64/audit': Permission denied
^Cfind: Interrupt
Performance counter stats for 'find /usr':
<not counted> task-clock
<not counted> context-switches
<not counted> CPU-migrations
<not counted> page-faults
<not counted> cycles
<not counted> stalled-cycles-frontend
<not counted> stalled-cycles-backend
<not counted> instructions
<not counted> branches
<not counted> branch-misses
1.282060271 seconds time elapsed
[acme@sandy ~]$ uname -r
3.4.0-rc4-uprobes+
But:
[acme@felicio linux]$ uname -r
3.4.0-rc3+
[acme@felicio linux]$ perf stat -g -- find /usr >/dev/null
^Cfind: Interrupt
Performance counter stats for 'find /usr':
126.499751 task-clock # 0.122 CPUs utilized
<not counted> context-switches
<not counted> CPU-migrations
<not counted> page-faults
366,694,182 cycles # 2.899 GHz
151,332,137 stalled-cycles-frontend # 41.27% frontend cycles idle
103,373,418 stalled-cycles-backend # 28.19% backend cycles idle
408,309,250 instructions # 1.11 insns per cycle
# 0.37 stalled cycles
# per insn
77,453,802 branches # 612.284 M/sec
1,703,728 branch-misses # 2.20% of all branches
1.032917277 seconds time elapsed
[acme@felicio linux]$
Bisecting...
- Arnaldo
Hi,
On Mon, 14 May 2012 07:09:48 -0600, David Ahern wrote:
> On 5/14/12 1:40 AM, Namhyung Kim wrote:
>> A problem I see is that it'll break group handling again:
>>
>> $ ./perf stat -g sleep 1
>>
>> Performance counter stats for 'sleep 1':
>>
>> <not counted> task-clock
>> <not counted> context-switches
>> <not counted> CPU-migrations
>> <not counted> page-faults
>> <not counted> cycles
>> <not counted> stalled-cycles-frontend
>> <not counted> stalled-cycles-backend
>> <not counted> instructions
>> <not counted> branches
>> <not counted> branch-misses
>>
>> 1.000868932 seconds time elapsed
>>
>> So I suggest changing perf_target__none() check to a proper one
>> (perf_target__no_cpu? - the name might be changed soon) for your
>> purpose.
>>
> Something else is wrong then. I tested that command (saw your patch in
> the history) and it worked for me. Also, this code path does not
> affect perf-stat -- it touches perf-record and perf-test only.
>
Ah, right. But still wouldn't it be better changing the conditional
rather than disabling it unconditionally?
Thanks,
Namhyung
On 5/14/12 7:07 PM, Namhyung Kim wrote:
>> Something else is wrong then. I tested that command (saw your patch in
>> the history) and it worked for me. Also, this code path does not
>> affect perf-stat -- it touches perf-record and perf-test only.
>>
>
> Ah, right. But still wouldn't it be better changing the conditional
> rather than disabling it unconditionally?
>
> Thanks,
> Namhyung
I think it would be best to disable all events initially and then enable
them when ready. It works for perf-record and perf-test just fine and
limits the samples to what you care about.
David
Em Mon, May 14, 2012 at 07:42:30PM -0600, David Ahern escreveu:
> On 5/14/12 7:07 PM, Namhyung Kim wrote:
> >>Something else is wrong then. I tested that command (saw your patch in
> >>the history) and it worked for me. Also, this code path does not
> >>affect perf-stat -- it touches perf-record and perf-test only.
> >Ah, right. But still wouldn't it be better changing the conditional
> >rather than disabling it unconditionally?
> I think it would be best to disable all events initially and then
> enable them when ready. It works for perf-record and perf-test just
> fine and limits the samples to what you care about.
And we need to have all this logic in a central place, the "open" method
of perf_evlist :-)
The perf_target abstraction is the way to get there, but in the process
I think we really need to have each new method with a 'perf test' entry
and in addition to that an 'autotest'* entry to test the perf builtins.
- Arnaldo
* http://autotest.github.com/
On 5/14/12 8:54 AM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 14, 2012 at 08:21:02AM -0600, David Ahern escreveu:
>> On 5/14/12 7:09 AM, David Ahern wrote:
>>>> A problem I see is that it'll break group handling again:
>>>>
>>>> $ ./perf stat -g sleep 1
>>>>
>>>> Performance counter stats for 'sleep 1':
>>>>
>>>> <not counted> task-clock
>>>> <not counted> context-switches
>>>> <not counted> CPU-migrations
>>>> <not counted> page-faults
>>>> <not counted> cycles
>>>> <not counted> stalled-cycles-frontend
>>>> <not counted> stalled-cycles-backend
>>>> <not counted> instructions
>>>> <not counted> branches
>>>> <not counted> branch-misses
>>>>
>>>> 1.000868932 seconds time elapsed
>>>>
>>>> So I suggest changing perf_target__none() check to a proper one
>>>> (perf_target__no_cpu? - the name might be changed soon) for your
>>>> purpose.
>>>
>>> Something else is wrong then. I tested that command (saw your patch in
>>> the history) and it worked for me. Also, this code path does not affect
>>> perf-stat -- it touches perf-record and perf-test only.
>>
>> I think it is something else. I am running latest git tree
>> (3.4.0-rc7). perf from Linus' tree and acme/core both show:
>>
>> perf stat -g -- find /usr>/dev/null
>>
>> Performance counter stats for 'find /usr':
>>
>> <not counted> task-clock
>> <not counted> context-switches
>> <not counted> CPU-migrations
>> <not counted> page-faults
>> <not counted> cycles
>> <not counted> stalled-cycles-frontend
>> <not counted> stalled-cycles-backend
>> <not counted> instructions
>> <not counted> branches
>> <not counted> branch-misses
>>
>> 0.111976940 seconds time elapsed
>>
>> (Using find to make sure some work is done as opposed to sleep;
>> openssl speed also shows the above.)
>
> [acme@sandy ~]$ perf stat -g -- find /usr>/dev/null
> find: `/usr/lib64/audit': Permission denied
> ^Cfind: Interrupt
>
> Performance counter stats for 'find /usr':
>
> <not counted> task-clock
> <not counted> context-switches
> <not counted> CPU-migrations
> <not counted> page-faults
> <not counted> cycles
> <not counted> stalled-cycles-frontend
> <not counted> stalled-cycles-backend
> <not counted> instructions
> <not counted> branches
> <not counted> branch-misses
>
> 1.282060271 seconds time elapsed
>
>
> [acme@sandy ~]$ uname -r
> 3.4.0-rc4-uprobes+
>
> But:
>
> [acme@felicio linux]$ uname -r
> 3.4.0-rc3+
> [acme@felicio linux]$ perf stat -g -- find /usr>/dev/null
> ^Cfind: Interrupt
>
> Performance counter stats for 'find /usr':
>
> 126.499751 task-clock # 0.122 CPUs utilized
> <not counted> context-switches
> <not counted> CPU-migrations
> <not counted> page-faults
> 366,694,182 cycles # 2.899 GHz
> 151,332,137 stalled-cycles-frontend # 41.27% frontend cycles idle
> 103,373,418 stalled-cycles-backend # 28.19% backend cycles idle
> 408,309,250 instructions # 1.11 insns per cycle
> # 0.37 stalled cycles
> # per insn
> 77,453,802 branches # 612.284 M/sec
> 1,703,728 branch-misses # 2.20% of all branches
>
> 1.032917277 seconds time elapsed
>
>
> [acme@felicio linux]$
>
> Bisecting...
>
> - Arnaldo
Seems like a few issues going on here. In all case the command run is:
perf stat -g -- find / >/dev/null
v3.1 kernel, v3.1 perf command -- command runs fine and generates output.
v3.2 kernel, v3.2 perf command -- fails - no output and nothing in dmesg.
Using latest acme/core version of perf generates a WARNING on all
versions tried - v3.1, v3.2, v3.4-rc3, and v3.4-rc7:
[ 143.964857] ------------[ cut here ]------------
[ 143.964870] WARNING: at
/mnt/sw/kernels/kernel-2.6.git/arch/x86/kernel/cpu/perf_event.c:860
x86_pmu_start+0xdc/0x110()
[ 143.964873] Hardware name: Bochs
[ 143.964875] Modules linked in: lockd ip6t_REJECT nf_conntrack_ipv6
nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack
ip6table_filter ip6_tables ppdev parport_pc parport virtio_net i2c_piix4
i2c_core sunrpc virtio_blk [last unloaded: scsi_wait_scan]
[ 143.964897] Pid: 968, comm: find Not tainted 3.2.0 #1
[ 143.964900] Call Trace:
[ 143.964908] [<ffffffff8106dd2f>] warn_slowpath_common+0x7f/0xc0
[ 143.964913] [<ffffffff8106dd8a>] warn_slowpath_null+0x1a/0x20
[ 143.964917] [<ffffffff810248bc>] x86_pmu_start+0xdc/0x110
[ 143.964921] [<ffffffff81024f32>] x86_pmu_enable+0x212/0x270
[ 143.964928] [<ffffffff8110f946>] perf_event_context_sched_in+0xe6/0x100
[ 143.964932] [<ffffffff8111131e>] perf_event_comm+0x11e/0x330
[ 143.964940] [<ffffffff81378284>] ? get_random_int+0x74/0x90
[ 143.964947] [<ffffffff8117be40>] set_task_comm+0x60/0x70
[ 143.964951] [<ffffffff8117c541>] setup_new_exec+0xe1/0x2d0
[ 143.964958] [<ffffffff811c47ec>] load_elf_binary+0x3ec/0x1a10
[ 143.964972] [<ffffffff8113c672>] ? get_user_pages+0x52/0x60
[ 143.964977] [<ffffffff8117a168>] ? get_user_arg_ptr+0x38/0x80
[ 143.964982] [<ffffffff8117af2c>] search_binary_handler+0xec/0x300
[ 143.964986] [<ffffffff811c4400>] ? do_mmap+0x40/0x40
[ 143.964990] [<ffffffff8117c2eb>] do_execve_common+0x2ab/0x330
[ 143.964994] [<ffffffff8117c3aa>] do_execve+0x3a/0x40
[ 143.964998] [<ffffffff8101d347>] sys_execve+0x47/0x70
[ 143.965010] [<ffffffff815d9bdc>] stub_execve+0x6c/0xc0
[ 143.965013] ---[ end trace 8de43f6b89ac81e1 ]---
David
On 5/14/12 7:46 PM, Arnaldo Carvalho de Melo wrote:
> And we need to have all this logic in a central place, the "open" method
> of perf_evlist :-)
>
> The perf_target abstraction is the way to get there, but in the process
> I think we really need to have each new method with a 'perf test' entry
> and in addition to that an 'autotest'* entry to test the perf builtins.
>
> - Arnaldo
>
> * http://autotest.github.com/
Yes, I've seen the project and know that kvm leverages the framework. My
knowledge ends there - never had a reason to try it out. With the vPMU
in v3.3 we could use KVM to automate a fair bit of the testing.
David
Hi,
On Mon, 14 May 2012 22:46:33 -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 14, 2012 at 07:42:30PM -0600, David Ahern escreveu:
>> On 5/14/12 7:07 PM, Namhyung Kim wrote:
>> >>Something else is wrong then. I tested that command (saw your patch in
>> >>the history) and it worked for me. Also, this code path does not
>> >>affect perf-stat -- it touches perf-record and perf-test only.
>
>> >Ah, right. But still wouldn't it be better changing the conditional
>> >rather than disabling it unconditionally?
>
>> I think it would be best to disable all events initially and then
>> enable them when ready. It works for perf-record and perf-test just
>> fine and limits the samples to what you care about.
>
> And we need to have all this logic in a central place, the "open" method
> of perf_evlist :-)
>
Agreed. So we need to make it generic to suitable for perf-stat and
perf-top (and others?) also.
> The perf_target abstraction is the way to get there, but in the process
> I think we really need to have each new method with a 'perf test' entry
> and in addition to that an 'autotest'* entry to test the perf builtins.
>
> - Arnaldo
>
> * http://autotest.github.com/
I'll have a look at it.
Thanks,
Namhyung
On Mon, 2012-05-14 at 22:46 -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 14, 2012 at 07:42:30PM -0600, David Ahern escreveu:
> > On 5/14/12 7:07 PM, Namhyung Kim wrote:
> > >>Something else is wrong then. I tested that command (saw your patch in
> > >>the history) and it worked for me. Also, this code path does not
> > >>affect perf-stat -- it touches perf-record and perf-test only.
>
> > >Ah, right. But still wouldn't it be better changing the conditional
> > >rather than disabling it unconditionally?
>
> > I think it would be best to disable all events initially and then
> > enable them when ready. It works for perf-record and perf-test just
> > fine and limits the samples to what you care about.
>
> And we need to have all this logic in a central place, the "open" method
> of perf_evlist :-)
>
> The perf_target abstraction is the way to get there, but in the process
> I think we really need to have each new method with a 'perf test' entry
> and in addition to that an 'autotest'* entry to test the perf builtins.
>
> - Arnaldo
Right now we have perf as a profiler inside autotest, which means one
can run autotest tests and have perf to profile the system while the
test is being run, so thank you guys for it :)
That said, I'm happy to help developing a test module for perf, be it
just a wrapper to an existing testsuite, or starting a new test module
from scratch. Let us know what you have in mind.
>
> * http://autotest.github.com/
On 5/14/12 7:52 PM, David Ahern wrote:
> On 5/14/12 8:54 AM, Arnaldo Carvalho de Melo wrote:
>> Em Mon, May 14, 2012 at 08:21:02AM -0600, David Ahern escreveu:
>>> On 5/14/12 7:09 AM, David Ahern wrote:
>>>>> A problem I see is that it'll break group handling again:
>>>>>
>>>>> $ ./perf stat -g sleep 1
>>>>>
>>>>> Performance counter stats for 'sleep 1':
>>>>>
>>>>> <not counted> task-clock
>>>>> <not counted> context-switches
>>>>> <not counted> CPU-migrations
>>>>> <not counted> page-faults
>>>>> <not counted> cycles
>>>>> <not counted> stalled-cycles-frontend
>>>>> <not counted> stalled-cycles-backend
>>>>> <not counted> instructions
>>>>> <not counted> branches
>>>>> <not counted> branch-misses
>>>>>
>>>>> 1.000868932 seconds time elapsed
>>>>>
>>>>> So I suggest changing perf_target__none() check to a proper one
>>>>> (perf_target__no_cpu? - the name might be changed soon) for your
>>>>> purpose.
>>>>
>>>> Something else is wrong then. I tested that command (saw your patch in
>>>> the history) and it worked for me. Also, this code path does not affect
>>>> perf-stat -- it touches perf-record and perf-test only.
>>>
>>> I think it is something else. I am running latest git tree
>>> (3.4.0-rc7). perf from Linus' tree and acme/core both show:
>>>
>>> perf stat -g -- find /usr>/dev/null
>>>
>>> Performance counter stats for 'find /usr':
>>>
>>> <not counted> task-clock
>>> <not counted> context-switches
>>> <not counted> CPU-migrations
>>> <not counted> page-faults
>>> <not counted> cycles
>>> <not counted> stalled-cycles-frontend
>>> <not counted> stalled-cycles-backend
>>> <not counted> instructions
>>> <not counted> branches
>>> <not counted> branch-misses
>>>
>>> 0.111976940 seconds time elapsed
>>>
>>> (Using find to make sure some work is done as opposed to sleep;
>>> openssl speed also shows the above.)
>>
>> [acme@sandy ~]$ perf stat -g -- find /usr>/dev/null
>> find: `/usr/lib64/audit': Permission denied
>> ^Cfind: Interrupt
>>
>> Performance counter stats for 'find /usr':
>>
>> <not counted> task-clock
>> <not counted> context-switches
>> <not counted> CPU-migrations
>> <not counted> page-faults
>> <not counted> cycles
>> <not counted> stalled-cycles-frontend
>> <not counted> stalled-cycles-backend
>> <not counted> instructions
>> <not counted> branches
>> <not counted> branch-misses
>>
>> 1.282060271 seconds time elapsed
>>
>>
>> [acme@sandy ~]$ uname -r
>> 3.4.0-rc4-uprobes+
>>
>> But:
>>
>> [acme@felicio linux]$ uname -r
>> 3.4.0-rc3+
>> [acme@felicio linux]$ perf stat -g -- find /usr>/dev/null
>> ^Cfind: Interrupt
>>
>> Performance counter stats for 'find /usr':
>>
>> 126.499751 task-clock # 0.122 CPUs utilized
>> <not counted> context-switches
>> <not counted> CPU-migrations
>> <not counted> page-faults
>> 366,694,182 cycles # 2.899 GHz
>> 151,332,137 stalled-cycles-frontend # 41.27% frontend cycles idle
>> 103,373,418 stalled-cycles-backend # 28.19% backend cycles idle
>> 408,309,250 instructions # 1.11 insns per cycle
>> # 0.37 stalled cycles
>> # per insn
>> 77,453,802 branches # 612.284 M/sec
>> 1,703,728 branch-misses # 2.20% of all branches
>>
>> 1.032917277 seconds time elapsed
>>
>>
>> [acme@felicio linux]$
>>
>> Bisecting...
>>
>> - Arnaldo
>
> Seems like a few issues going on here. In all case the command run is:
> perf stat -g -- find / >/dev/null
>
> v3.1 kernel, v3.1 perf command -- command runs fine and generates output.
>
> v3.2 kernel, v3.2 perf command -- fails - no output and nothing in dmesg.
>
> Using latest acme/core version of perf generates a WARNING on all
> versions tried - v3.1, v3.2, v3.4-rc3, and v3.4-rc7:
>
> [ 143.964857] ------------[ cut here ]------------
> [ 143.964870] WARNING: at
> /mnt/sw/kernels/kernel-2.6.git/arch/x86/kernel/cpu/perf_event.c:860
> x86_pmu_start+0xdc/0x110()
> [ 143.964873] Hardware name: Bochs
> [ 143.964875] Modules linked in: lockd ip6t_REJECT nf_conntrack_ipv6
> nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack
> ip6table_filter ip6_tables ppdev parport_pc parport virtio_net i2c_piix4
> i2c_core sunrpc virtio_blk [last unloaded: scsi_wait_scan]
> [ 143.964897] Pid: 968, comm: find Not tainted 3.2.0 #1
> [ 143.964900] Call Trace:
> [ 143.964908] [<ffffffff8106dd2f>] warn_slowpath_common+0x7f/0xc0
> [ 143.964913] [<ffffffff8106dd8a>] warn_slowpath_null+0x1a/0x20
> [ 143.964917] [<ffffffff810248bc>] x86_pmu_start+0xdc/0x110
> [ 143.964921] [<ffffffff81024f32>] x86_pmu_enable+0x212/0x270
> [ 143.964928] [<ffffffff8110f946>] perf_event_context_sched_in+0xe6/0x100
> [ 143.964932] [<ffffffff8111131e>] perf_event_comm+0x11e/0x330
> [ 143.964940] [<ffffffff81378284>] ? get_random_int+0x74/0x90
> [ 143.964947] [<ffffffff8117be40>] set_task_comm+0x60/0x70
> [ 143.964951] [<ffffffff8117c541>] setup_new_exec+0xe1/0x2d0
> [ 143.964958] [<ffffffff811c47ec>] load_elf_binary+0x3ec/0x1a10
> [ 143.964972] [<ffffffff8113c672>] ? get_user_pages+0x52/0x60
> [ 143.964977] [<ffffffff8117a168>] ? get_user_arg_ptr+0x38/0x80
> [ 143.964982] [<ffffffff8117af2c>] search_binary_handler+0xec/0x300
> [ 143.964986] [<ffffffff811c4400>] ? do_mmap+0x40/0x40
> [ 143.964990] [<ffffffff8117c2eb>] do_execve_common+0x2ab/0x330
> [ 143.964994] [<ffffffff8117c3aa>] do_execve+0x3a/0x40
> [ 143.964998] [<ffffffff8101d347>] sys_execve+0x47/0x70
> [ 143.965010] [<ffffffff815d9bdc>] stub_execve+0x6c/0xc0
> [ 143.965013] ---[ end trace 8de43f6b89ac81e1 ]---
I think I'm getting dizzy, and I knew I tested the whole 'perf-stat -g'
thing. It works *fine* on a Westmere box (E5620); the above comments
apply to a VM on a Nehalem server (E5540). (host is running 3.4.0-rc7
kernel and -cpu host argument for qemu-kvm.)
Namhyung/Arnaldo: What processor are you using?
David
Em Mon, May 14, 2012 at 09:28:07PM -0600, David Ahern escreveu:
> On 5/14/12 7:52 PM, David Ahern wrote:
> >On 5/14/12 8:54 AM, Arnaldo Carvalho de Melo wrote:
> >>Em Mon, May 14, 2012 at 08:21:02AM -0600, David Ahern escreveu:
> >>>On 5/14/12 7:09 AM, David Ahern wrote:
> >>>>>A problem I see is that it'll break group handling again:
> >>>>>
> >>>>>$ ./perf stat -g sleep 1
> >>>>>
> >>>>>Performance counter stats for 'sleep 1':
> >>>>>
> >>>>><not counted> task-clock
> >>>>><not counted> context-switches
> >>>>><not counted> CPU-migrations
> >>>>><not counted> page-faults
> >>>>><not counted> cycles
> >>>>><not counted> stalled-cycles-frontend
> >>>>><not counted> stalled-cycles-backend
> >>>>><not counted> instructions
> >>>>><not counted> branches
> >>>>><not counted> branch-misses
> >>>>>
> >>>>>1.000868932 seconds time elapsed
> >>>>>
> >>>>>So I suggest changing perf_target__none() check to a proper one
> >>>>>(perf_target__no_cpu? - the name might be changed soon) for your
> >>>>>purpose.
> >>>>
> >>>>Something else is wrong then. I tested that command (saw your patch in
> >>>>the history) and it worked for me. Also, this code path does not affect
> >>>>perf-stat -- it touches perf-record and perf-test only.
> >>>
> >>>I think it is something else. I am running latest git tree
> >>>(3.4.0-rc7). perf from Linus' tree and acme/core both show:
> >>>
> >>>perf stat -g -- find /usr>/dev/null
> >>>
> >>>Performance counter stats for 'find /usr':
> >>>
> >>><not counted> task-clock
> >>><not counted> context-switches
> >>><not counted> CPU-migrations
> >>><not counted> page-faults
> >>><not counted> cycles
> >>><not counted> stalled-cycles-frontend
> >>><not counted> stalled-cycles-backend
> >>><not counted> instructions
> >>><not counted> branches
> >>><not counted> branch-misses
> >>>
> >>>0.111976940 seconds time elapsed
> >>>
> >>>(Using find to make sure some work is done as opposed to sleep;
> >>>openssl speed also shows the above.)
> >>
> >>[acme@sandy ~]$ perf stat -g -- find /usr>/dev/null
> >>find: `/usr/lib64/audit': Permission denied
> >>^Cfind: Interrupt
> >>
> >>Performance counter stats for 'find /usr':
> >>
> >><not counted> task-clock
> >><not counted> context-switches
> >><not counted> CPU-migrations
> >><not counted> page-faults
> >><not counted> cycles
> >><not counted> stalled-cycles-frontend
> >><not counted> stalled-cycles-backend
> >><not counted> instructions
> >><not counted> branches
> >><not counted> branch-misses
> >>
> >>1.282060271 seconds time elapsed
> >>
> >>
> >>[acme@sandy ~]$ uname -r
> >>3.4.0-rc4-uprobes+
> >>
> >>But:
> >>
> >>[acme@felicio linux]$ uname -r
> >>3.4.0-rc3+
> >>[acme@felicio linux]$ perf stat -g -- find /usr>/dev/null
> >>^Cfind: Interrupt
> >>
> >>Performance counter stats for 'find /usr':
> >>
> >>126.499751 task-clock # 0.122 CPUs utilized
> >><not counted> context-switches
> >><not counted> CPU-migrations
> >><not counted> page-faults
> >>366,694,182 cycles # 2.899 GHz
> >>151,332,137 stalled-cycles-frontend # 41.27% frontend cycles idle
> >>103,373,418 stalled-cycles-backend # 28.19% backend cycles idle
> >>408,309,250 instructions # 1.11 insns per cycle
> >># 0.37 stalled cycles
> >># per insn
> >>77,453,802 branches # 612.284 M/sec
> >>1,703,728 branch-misses # 2.20% of all branches
> >>
> >>1.032917277 seconds time elapsed
> >>
> >>
> >>[acme@felicio linux]$
> >>
> >>Bisecting...
> >>
> >>- Arnaldo
> >
> >Seems like a few issues going on here. In all case the command run is:
> >perf stat -g -- find / >/dev/null
> >
> >v3.1 kernel, v3.1 perf command -- command runs fine and generates output.
> >
> >v3.2 kernel, v3.2 perf command -- fails - no output and nothing in dmesg.
> >
> >Using latest acme/core version of perf generates a WARNING on all
> >versions tried - v3.1, v3.2, v3.4-rc3, and v3.4-rc7:
> >
> >[ 143.964857] ------------[ cut here ]------------
> >[ 143.964870] WARNING: at
> >/mnt/sw/kernels/kernel-2.6.git/arch/x86/kernel/cpu/perf_event.c:860
> >x86_pmu_start+0xdc/0x110()
> >[ 143.964873] Hardware name: Bochs
> >[ 143.964875] Modules linked in: lockd ip6t_REJECT nf_conntrack_ipv6
> >nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack
> >ip6table_filter ip6_tables ppdev parport_pc parport virtio_net i2c_piix4
> >i2c_core sunrpc virtio_blk [last unloaded: scsi_wait_scan]
> >[ 143.964897] Pid: 968, comm: find Not tainted 3.2.0 #1
> >[ 143.964900] Call Trace:
> >[ 143.964908] [<ffffffff8106dd2f>] warn_slowpath_common+0x7f/0xc0
> >[ 143.964913] [<ffffffff8106dd8a>] warn_slowpath_null+0x1a/0x20
> >[ 143.964917] [<ffffffff810248bc>] x86_pmu_start+0xdc/0x110
> >[ 143.964921] [<ffffffff81024f32>] x86_pmu_enable+0x212/0x270
> >[ 143.964928] [<ffffffff8110f946>] perf_event_context_sched_in+0xe6/0x100
> >[ 143.964932] [<ffffffff8111131e>] perf_event_comm+0x11e/0x330
> >[ 143.964940] [<ffffffff81378284>] ? get_random_int+0x74/0x90
> >[ 143.964947] [<ffffffff8117be40>] set_task_comm+0x60/0x70
> >[ 143.964951] [<ffffffff8117c541>] setup_new_exec+0xe1/0x2d0
> >[ 143.964958] [<ffffffff811c47ec>] load_elf_binary+0x3ec/0x1a10
> >[ 143.964972] [<ffffffff8113c672>] ? get_user_pages+0x52/0x60
> >[ 143.964977] [<ffffffff8117a168>] ? get_user_arg_ptr+0x38/0x80
> >[ 143.964982] [<ffffffff8117af2c>] search_binary_handler+0xec/0x300
> >[ 143.964986] [<ffffffff811c4400>] ? do_mmap+0x40/0x40
> >[ 143.964990] [<ffffffff8117c2eb>] do_execve_common+0x2ab/0x330
> >[ 143.964994] [<ffffffff8117c3aa>] do_execve+0x3a/0x40
> >[ 143.964998] [<ffffffff8101d347>] sys_execve+0x47/0x70
> >[ 143.965010] [<ffffffff815d9bdc>] stub_execve+0x6c/0xc0
> >[ 143.965013] ---[ end trace 8de43f6b89ac81e1 ]---
>
> I think I'm getting dizzy, and I knew I tested the whole 'perf-stat
> -g' thing. It works *fine* on a Westmere box (E5620); the above
> comments apply to a VM on a Nehalem server (E5540). (host is running
> 3.4.0-rc7 kernel and -cpu host argument for qemu-kvm.)
>
> Namhyung/Arnaldo: What processor are you using?
Here it is mostly:
[acme@sandy autotest]$ grep "model name" /proc/cpuinfo | sort -u
model name : Intel(R) Core(TM) i7-2920XM CPU @ 2.50GHz
Sandy Bridge.
- Arnaldo
Hi,
On Mon, 14 May 2012 21:28:07 -0600, David Ahern wrote:
> I think I'm getting dizzy, and I knew I tested the whole 'perf-stat
> -g' thing. It works *fine* on a Westmere box (E5620); the above
> comments apply to a VM on a Nehalem server (E5540). (host is running
> 3.4.0-rc7 kernel and -cpu host argument for qemu-kvm.)
>
> Namhyung/Arnaldo: What processor are you using?
>
$ grep "model name" /proc/cpuinfo | sort -u
model name : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Thanks,
Namhyung
Commit-ID: 5e1c81d98a5621007824b49dde556fead5ff9c6c
Gitweb: http://git.kernel.org/tip/5e1c81d98a5621007824b49dde556fead5ff9c6c
Author: David Ahern <[email protected]>
AuthorDate: Sun, 13 May 2012 22:01:28 -0600
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 18 May 2012 16:02:42 -0300
perf evsel: Create events initially disabled -- again
764e16a changed perf-record to create events disabled by default and
enable them once perf initializations are done. This setting was dropped
by 0f82ebc. Now perf events are once again generated during perf's
initialization phase (e.g., generating maps).
As an example, perf opens a lot of files at startup. Unpatched:
perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.087 MB /tmp/perf.data (~3798 samples) ]
Using perf-script to look at the samples shows the perf command generating
563 of the 566 total events.
Patched:
perf record -e syscalls:sys_enter_open -ga -fo /tmp/perf.data -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB /tmp/perf.data (~1206 samples) ]
Using perf-script to look at the samples does not show perf command.
Signed-off-by: David Ahern <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/evsel.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d7a2b4b..f4f427c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -70,6 +70,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
struct perf_event_attr *attr = &evsel->attr;
int track = !evsel->idx; /* only the first counter needs these */
+ attr->disabled = 1;
attr->sample_id_all = opts->sample_id_all_missing ? 0 : 1;
attr->inherit = !opts->no_inherit;
attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
@@ -138,7 +139,6 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
if (perf_target__none(&opts->target) &&
(!opts->group || evsel == first)) {
- attr->disabled = 1;
attr->enable_on_exec = 1;
}
}