This patch allows perf record to request that event
timing be recorded in each sample. The timing consists
of time_enabled and time_running. These two values are
used to compute the multiplexing correction, i.e.,
how long an event was actually measured by the hardware.
To activate, the user must use:
$ perf record -a -R ....
The patch works by forcing PERF_SAMPLE_READ in raw mode (-R),
i.e., reading of the event group in each sample.
The side effect is that both time_running and time_enable are
captured + an empty four byte RAW section. This way we leverage
an existing perf record mode and do not add yet another option.
With this patch, it is possible to evaluate the total number
of occurrences of each sampling event even when multiplexing is
active.
Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 2 ++
tools/perf/util/evsel.c | 5 +++++
2 files changed, 7 insertions(+)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 9bdea047c5db..6a0bfd29bac9 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -270,6 +270,8 @@ OPTIONS
-R::
--raw-samples::
Collect raw sample records from all opened counters (default for tracepoint counters).
+With this option, each sample includes at least: CPU, timestamp, time running/enabled
+(multiplexing scaling factor).
-C::
--cpu::
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 3735c9e0080d..3305c0b10161 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -852,6 +852,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
attr->inherit = !opts->no_inherit;
attr->write_backward = opts->overwrite ? 1 : 0;
+ attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
+ PERF_FORMAT_TOTAL_TIME_RUNNING |
+ PERF_FORMAT_ID;
perf_evsel__set_sample_bit(evsel, IP);
perf_evsel__set_sample_bit(evsel, TID);
@@ -945,6 +948,8 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
perf_evsel__set_sample_bit(evsel, TIME);
perf_evsel__set_sample_bit(evsel, RAW);
perf_evsel__set_sample_bit(evsel, CPU);
+ perf_evsel__set_sample_bit(evsel, ID);
+ perf_evsel__set_sample_bit(evsel, READ);
}
if (opts->sample_address)
--
2.7.4
Stephane Eranian <[email protected]> writes:
>
> To activate, the user must use:
> $ perf record -a -R ....
I don't know why you're overloading the existing raw mode?
It has nothing to do with that.
-Andi
On Mon, Aug 21, 2017 at 4:02 PM, Andi Kleen <[email protected]> wrote:
>
> Stephane Eranian <[email protected]> writes:
> >
> > To activate, the user must use:
> > $ perf record -a -R ....
>
> I don't know why you're overloading the existing raw mode?
>
> It has nothing to do with that.
>
I explained this in the changelog. So that is does not change any of
the processing in perf report, i.e., no faced with data it does not
know how to handle.
Also trying to avoid adding yet another option.
>
> -Andi
On Mon, Aug 21, 2017 at 05:13:29PM -0700, Stephane Eranian wrote:
> On Mon, Aug 21, 2017 at 4:02 PM, Andi Kleen <[email protected]> wrote:
> >
> > Stephane Eranian <[email protected]> writes:
> > >
> > > To activate, the user must use:
> > > $ perf record -a -R ....
> >
> > I don't know why you're overloading the existing raw mode?
> >
> > It has nothing to do with that.
> >
> I explained this in the changelog. So that is does not change any of
> the processing in perf report, i.e., no faced with data it does not
> know how to handle.
> Also trying to avoid adding yet another option.
But raw is needed for some of the non Intel PMUs. I believe it's
the only way to use AMD IBS. You may as well break their usage.
You'll need a new option.
-Andi
On Mon, Aug 21, 2017 at 06:25:45PM -0700, Andi Kleen wrote:
> On Mon, Aug 21, 2017 at 05:13:29PM -0700, Stephane Eranian wrote:
> > On Mon, Aug 21, 2017 at 4:02 PM, Andi Kleen <[email protected]> wrote:
> > >
> > > Stephane Eranian <[email protected]> writes:
> > > >
> > > > To activate, the user must use:
> > > > $ perf record -a -R ....
> > >
> > > I don't know why you're overloading the existing raw mode?
> > >
> > > It has nothing to do with that.
> > >
> > I explained this in the changelog. So that is does not change any of
> > the processing in perf report, i.e., no faced with data it does not
> > know how to handle.
> > Also trying to avoid adding yet another option.
>
> But raw is needed for some of the non Intel PMUs. I believe it's
> the only way to use AMD IBS. You may as well break their usage.
>
> You'll need a new option.
I agree with Andi, I don't think we should mix those,
we should have a way to switch it on/off
jirka
On Tue, Aug 22, 2017 at 12:03 AM, Jiri Olsa <[email protected]> wrote:
>
> On Mon, Aug 21, 2017 at 06:25:45PM -0700, Andi Kleen wrote:
> > On Mon, Aug 21, 2017 at 05:13:29PM -0700, Stephane Eranian wrote:
> > > On Mon, Aug 21, 2017 at 4:02 PM, Andi Kleen <[email protected]> wrote:
> > > >
> > > > Stephane Eranian <[email protected]> writes:
> > > > >
> > > > > To activate, the user must use:
> > > > > $ perf record -a -R ....
> > > >
> > > > I don't know why you're overloading the existing raw mode?
> > > >
> > > > It has nothing to do with that.
> > > >
> > > I explained this in the changelog. So that is does not change any of
> > > the processing in perf report, i.e., no faced with data it does not
> > > know how to handle.
> > > Also trying to avoid adding yet another option.
> >
> > But raw is needed for some of the non Intel PMUs. I believe it's
> > the only way to use AMD IBS. You may as well break their usage.
> >
> > You'll need a new option.
>
> I agree with Andi, I don't think we should mix those,
> we should have a way to switch it on/off
>
Ok, then. I will add an option to turn this on. This is a useful mode
for many advanced users.
>
> jirka
Hi,
On Tue, Aug 22, 2017 at 12:24 AM, Stephane Eranian <[email protected]> wrote:
> On Tue, Aug 22, 2017 at 12:03 AM, Jiri Olsa <[email protected]> wrote:
>>
>> On Mon, Aug 21, 2017 at 06:25:45PM -0700, Andi Kleen wrote:
>> > On Mon, Aug 21, 2017 at 05:13:29PM -0700, Stephane Eranian wrote:
>> > > On Mon, Aug 21, 2017 at 4:02 PM, Andi Kleen <[email protected]> wrote:
>> > > >
>> > > > Stephane Eranian <[email protected]> writes:
>> > > > >
>> > > > > To activate, the user must use:
>> > > > > $ perf record -a -R ....
>> > > >
>> > > > I don't know why you're overloading the existing raw mode?
>> > > >
>> > > > It has nothing to do with that.
>> > > >
>> > > I explained this in the changelog. So that is does not change any of
>> > > the processing in perf report, i.e., no faced with data it does not
>> > > know how to handle.
>> > > Also trying to avoid adding yet another option.
>> >
>> > But raw is needed for some of the non Intel PMUs. I believe it's
>> > the only way to use AMD IBS. You may as well break their usage.
>> >
>> > You'll need a new option.
>>
>> I agree with Andi, I don't think we should mix those,
>> we should have a way to switch it on/off
>>
> Ok, then. I will add an option to turn this on. This is a useful mode
> for many advanced users.
>
I looked at the perf source code again over the weekend. It seems
there is no need for a new patch.
Whatever is needed to capture time_running/time_enabled is already
there. The kernel can only
record the timings when PERF_SAMPLE_READ is set. The way to enable
this with current perf
is to use the S modified on events:
$ perf record -e cycles:S .....
That captures the timings, though they are in a read struct inside the
record. But this is exactly
what my earlier patch was doing.
The modifier also works with the more developed syntax:
$ perf record -e cpu/event=0xc0,umask=1/S ....
So I think we are good to go. to capture multiplexing scaling factor
when sampling simply use the S
modifier.
But to my surprise, newer kernels are not happy with the cmdline:
$ perf record -e cycles:S noploop 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cycles:Su).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
That looks like a bug to me. Why would that not work?
I have not yet had the time to look into this. Jiri?
Thanks.
>>
>> jirka
> So I think we are good to go. to capture multiplexing scaling factor
> when sampling simply use the S
> modifier.
> But to my surprise, newer kernels are not happy with the cmdline:
> $ perf record -e cycles:S noploop 1
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (cycles:Su).
> /bin/dmesg may provide additional information.
> No CONFIG_PERF_EVENTS=y kernel support configured?
Likely due to
ba5213ae6b88 perf/core: Correct event creation with PERF_FORMAT_GROUP
It's not supported with inherited events.
-Andi
Hi,
On Mon, Aug 28, 2017 at 1:41 PM, Andi Kleen <[email protected]> wrote:
>> So I think we are good to go. to capture multiplexing scaling factor
>> when sampling simply use the S
>> modifier.
>> But to my surprise, newer kernels are not happy with the cmdline:
>> $ perf record -e cycles:S noploop 1
>> Error:
>> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
>> for event (cycles:Su).
>> /bin/dmesg may provide additional information.
>> No CONFIG_PERF_EVENTS=y kernel support configured?
>
> Likely due to
>
> ba5213ae6b88 perf/core: Correct event creation with PERF_FORMAT_GROUP
>
> It's not supported with inherited events.
>
Yes, and other things have changed as well. I did a bit of research to
figure out how
to make this work out-of the-box with the latest perf (v4.13). It
turns out you need to
combine multiple options and an event modifier. This is quite cumbersome
but here it is:
$ perf record --no-inherit --running-time -e cycles:S ........
You need:
- no-inherit: the kernel does not know how to deal with multiplexing
when events are inherited
- running-time: this used to be automatic for PERF_SAMPLE_READ with
perf record, now it is not
This includes TIME_ENABLED/TIME_RUNNING in the sample_read format.
- :S : to add a PERF_SAMPLE_READ to each sample, it encapsulates the
event value + timings.
We do not care about the value but are only interested in the timings.
The kernel cannot record the timings without a PERF_SAMPLE_READ.
I am also surprised to see that perf record keep inherit=1 in
system-wide mode. I don't think this
is relavant in this mode. But the kernel this fails in this case,
which I think is a bug. In system-wide
mode, the attr-.no_inherit should be ignored. We can fix perf record
to avoid this in system-wide.
The cmdline above works for both per-thread and system-wide modes.
So I think we do not need my patch or variations thereof, everything
is there, though a bit difficult
to combine.
On Wed, Aug 30, 2017 at 11:21:23PM -0700, Stephane Eranian wrote:
SNIP
> I am also surprised to see that perf record keep inherit=1 in
> system-wide mode. I don't think this
> is relavant in this mode. But the kernel this fails in this case,
> which I think is a bug. In system-wide
> mode, the attr-.no_inherit should be ignored. We can fix perf record
> to avoid this in system-wide.
agreed, how about attached patch
jirka
---
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 9bdea047c5db..e3cf1bd463f9 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -158,6 +158,7 @@ OPTIONS
-a::
--all-cpus::
System-wide collection from all CPUs (default if no target is specified).
+ Disables inheritance of counters (forces --no-inherit option).
-p::
--pid=::
@@ -277,6 +278,7 @@ Collect samples only on the list of CPUs provided. Multiple CPUs can be provided
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
In per-thread mode with inheritance mode on (default), samples are captured only when
the thread executes on the designated CPUs. Default is to monitor all CPUs.
+Disables inheritance of counters (forces --no-inherit option).
-B::
--no-buildid::
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d9bd632ed7db..22308c42321d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -850,7 +850,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
bool per_cpu = opts->target.default_per_cpu && !opts->target.per_thread;
attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
- attr->inherit = !opts->no_inherit;
+ attr->inherit = !opts->no_inherit && !target__has_cpu(&opts->target);
attr->write_backward = opts->overwrite ? 1 : 0;
perf_evsel__set_sample_bit(evsel, IP);
On Wed, Aug 30, 2017 at 11:21:23PM -0700, Stephane Eranian wrote:
> Hi,
>
>
> On Mon, Aug 28, 2017 at 1:41 PM, Andi Kleen <[email protected]> wrote:
> >> So I think we are good to go. to capture multiplexing scaling factor
> >> when sampling simply use the S
> >> modifier.
> >> But to my surprise, newer kernels are not happy with the cmdline:
> >> $ perf record -e cycles:S noploop 1
> >> Error:
> >> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> >> for event (cycles:Su).
> >> /bin/dmesg may provide additional information.
> >> No CONFIG_PERF_EVENTS=y kernel support configured?
> >
> > Likely due to
> >
> > ba5213ae6b88 perf/core: Correct event creation with PERF_FORMAT_GROUP
> >
> > It's not supported with inherited events.
> >
> Yes, and other things have changed as well. I did a bit of research to
> figure out how
> to make this work out-of the-box with the latest perf (v4.13). It
> turns out you need to
> combine multiple options and an event modifier. This is quite cumbersome
> but here it is:
>
> $ perf record --no-inherit --running-time -e cycles:S ........
>
> You need:
> - no-inherit: the kernel does not know how to deal with multiplexing
> when events are inherited
> - running-time: this used to be automatic for PERF_SAMPLE_READ with
> perf record, now it is not
> This includes TIME_ENABLED/TIME_RUNNING in the sample_read format.
> - :S : to add a PERF_SAMPLE_READ to each sample, it encapsulates the
> event value + timings.
> We do not care about the value but are only interested in the timings.
>
> The kernel cannot record the timings without a PERF_SAMPLE_READ.
>
> I am also surprised to see that perf record keep inherit=1 in
> system-wide mode. I don't think this
> is relavant in this mode. But the kernel this fails in this case,
> which I think is a bug. In system-wide
> mode, the attr-.no_inherit should be ignored. We can fix perf record
> to avoid this in system-wide.
>
> The cmdline above works for both per-thread and system-wide modes.
>
> So I think we do not need my patch or variations thereof, everything
> is there, though a bit difficult
> to combine.
hum, how about we introduce new modifier to attach timing info, like:
$ perf record -e cycles:T ....
modifiers might be scares resource, but we don't add them every day,
and this requirement looks generic
jirka
On Fri, Sep 1, 2017 at 12:59 AM, Jiri Olsa <[email protected]> wrote:
> On Wed, Aug 30, 2017 at 11:21:23PM -0700, Stephane Eranian wrote:
>> Hi,
>>
>>
>> On Mon, Aug 28, 2017 at 1:41 PM, Andi Kleen <[email protected]> wrote:
>> >> So I think we are good to go. to capture multiplexing scaling factor
>> >> when sampling simply use the S
>> >> modifier.
>> >> But to my surprise, newer kernels are not happy with the cmdline:
>> >> $ perf record -e cycles:S noploop 1
>> >> Error:
>> >> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
>> >> for event (cycles:Su).
>> >> /bin/dmesg may provide additional information.
>> >> No CONFIG_PERF_EVENTS=y kernel support configured?
>> >
>> > Likely due to
>> >
>> > ba5213ae6b88 perf/core: Correct event creation with PERF_FORMAT_GROUP
>> >
>> > It's not supported with inherited events.
>> >
>> Yes, and other things have changed as well. I did a bit of research to
>> figure out how
>> to make this work out-of the-box with the latest perf (v4.13). It
>> turns out you need to
>> combine multiple options and an event modifier. This is quite cumbersome
>> but here it is:
>>
>> $ perf record --no-inherit --running-time -e cycles:S ........
>>
>> You need:
>> - no-inherit: the kernel does not know how to deal with multiplexing
>> when events are inherited
>> - running-time: this used to be automatic for PERF_SAMPLE_READ with
>> perf record, now it is not
>> This includes TIME_ENABLED/TIME_RUNNING in the sample_read format.
>> - :S : to add a PERF_SAMPLE_READ to each sample, it encapsulates the
>> event value + timings.
>> We do not care about the value but are only interested in the timings.
>>
>> The kernel cannot record the timings without a PERF_SAMPLE_READ.
>>
>> I am also surprised to see that perf record keep inherit=1 in
>> system-wide mode. I don't think this
>> is relavant in this mode. But the kernel this fails in this case,
>> which I think is a bug. In system-wide
>> mode, the attr-.no_inherit should be ignored. We can fix perf record
>> to avoid this in system-wide.
>>
>> The cmdline above works for both per-thread and system-wide modes.
>>
>> So I think we do not need my patch or variations thereof, everything
>> is there, though a bit difficult
>> to combine.
>
> hum, how about we introduce new modifier to attach timing info, like:
> $ perf record -e cycles:T ....
>
> modifiers might be scares resource, but we don't add them every day,
> and this requirement looks generic
>
It is not just a matter of modifier, you need to have the kernel
record just what you want.
AFAIK, the only way for the the kernel to record timings on sampling events
is to force PERF_SAMPLE_READ. So the T modifier would have to also set
that format,
at which point, I wonder how useful it is compared to S.
Alternatively, we could improve the kernel to support recording timing with
PERF_SAMPLE_TIMINGS as a pair of u64 to represent time_enabled, time_running.
That would avoid the whole PERF_SAMPLE_READ and the extra u64 it records.
Recording the value of the sampling event is not very useful because
it keeps being
reset for each period.
On Fri, Sep 01, 2017 at 01:21:03AM -0700, Stephane Eranian wrote:
SNIP
> >> I am also surprised to see that perf record keep inherit=1 in
> >> system-wide mode. I don't think this
> >> is relavant in this mode. But the kernel this fails in this case,
> >> which I think is a bug. In system-wide
> >> mode, the attr-.no_inherit should be ignored. We can fix perf record
> >> to avoid this in system-wide.
> >>
> >> The cmdline above works for both per-thread and system-wide modes.
> >>
> >> So I think we do not need my patch or variations thereof, everything
> >> is there, though a bit difficult
> >> to combine.
> >
> > hum, how about we introduce new modifier to attach timing info, like:
> > $ perf record -e cycles:T ....
> >
> > modifiers might be scares resource, but we don't add them every day,
> > and this requirement looks generic
> >
> It is not just a matter of modifier, you need to have the kernel
> record just what you want.
> AFAIK, the only way for the the kernel to record timings on sampling events
> is to force PERF_SAMPLE_READ. So the T modifier would have to also set
> that format,
> at which point, I wonder how useful it is compared to S.
well :S is meant to be used for leader sampling,
so in addition to what you described it also
disables sampling on all other group members,
which I don't think you want
>
> Alternatively, we could improve the kernel to support recording timing with
> PERF_SAMPLE_TIMINGS as a pair of u64 to represent time_enabled, time_running.
> That would avoid the whole PERF_SAMPLE_READ and the extra u64 it records.
> Recording the value of the sampling event is not very useful because
> it keeps being
> reset for each period.
sounds good.. or if we mind taking another bit out of the sample_type
for this we could have new read_format bit PERF_FORMAT_NO_COUNT, which
would omit the count values
jirka