2010-11-24 11:35:54

by Franck Bui-Huu

[permalink] [raw]
Subject: perf: some questions about perf software events

Hello Peter,

I've still a couple of questions after looking at the software events
code, hope you don't mind.

For pure software events (ie excluding {task,cpu}-clock), does it make
sense to set a sample frequency ? I would have done something like this:


diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 31515200..df27fd8 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4671,6 +4671,8 @@ static int perf_swevent_init(struct perf_event *event)

if (event->attr.type != PERF_TYPE_SOFTWARE)
return -ENOENT;
+ if (event->attr.freq)
+ return -EINVAL;

switch (event_id) {
case PERF_COUNT_SW_CPU_CLOCK:



That is for no 'contiguous' events, setting a sampling frequency doesn't
really make sense since for example you could set a frequency to 1000 HZ
for the software ALIGNMENT_FAULT event and never get any samplings or at
least getting sampling but with a totally different rate. And the
current code doesn't look to handle sample_freq anyway.

Also I'm currently not seeing any real differences between cpu-clock and
task-clock events. They both seem to count the time elapsed when the
task is running on a CPU. Am I wrong ?

Thanks
--
Franck


2010-11-24 11:41:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: some questions about perf software events

On Wed, 2010-11-24 at 12:35 +0100, Franck Bui-Huu wrote:
> Hello Peter,
>
> I've still a couple of questions after looking at the software events
> code, hope you don't mind.
>
> For pure software events (ie excluding {task,cpu}-clock), does it make
> sense to set a sample frequency ? I would have done something like this:
>
>
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index 31515200..df27fd8 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -4671,6 +4671,8 @@ static int perf_swevent_init(struct perf_event *event)
>
> if (event->attr.type != PERF_TYPE_SOFTWARE)
> return -ENOENT;
> + if (event->attr.freq)
> + return -EINVAL;
>
> switch (event_id) {
> case PERF_COUNT_SW_CPU_CLOCK:
>
>
>
> That is for no 'contiguous' events, setting a sampling frequency doesn't
> really make sense since for example you could set a frequency to 1000 HZ
> for the software ALIGNMENT_FAULT event and never get any samplings or at
> least getting sampling but with a totally different rate. And the
> current code doesn't look to handle sample_freq anyway.

All the freq bits are in the generic code, it re-computes the rate on
the timer-tick as well as on each event occurrence.

Freq driven sampling should work just fine with swevents.

> Also I'm currently not seeing any real differences between cpu-clock and
> task-clock events. They both seem to count the time elapsed when the
> task is running on a CPU. Am I wrong ?

No, Francis already noticed that, I probably wrecked it when I added the
multi-pmu stuff, its on my todo list to look at (Francis also handed me
a little patchlet), but I keep getting distracted with other stuff :/

2010-11-27 13:28:48

by Franck Bui-Huu

[permalink] [raw]
Subject: Re: perf: some questions about perf software events

Peter Zijlstra <[email protected]> writes:

> On Wed, 2010-11-24 at 12:35 +0100, Franck Bui-Huu wrote:

[...]

>> That is for no 'contiguous' events, setting a sampling frequency doesn't
>> really make sense since for example you could set a frequency to 1000 HZ
>> for the software ALIGNMENT_FAULT event and never get any samplings or at
>> least getting sampling but with a totally different rate. And the
>> current code doesn't look to handle sample_freq anyway.
>
> All the freq bits are in the generic code, it re-computes the rate on
> the timer-tick as well as on each event occurrence.
>
> Freq driven sampling should work just fine with swevents.
>

Yes, but how does it behave with ALIGNMENT_FAULTS for example ?

Such event may happen at a very disparate rate or it can even never
happen at all.

>
>> Also I'm currently not seeing any real differences between cpu-clock and
>> task-clock events. They both seem to count the time elapsed when the
>> task is running on a CPU. Am I wrong ?
>
> No, Francis already noticed that, I probably wrecked it when I added the
> multi-pmu stuff, its on my todo list to look at (Francis also handed me
> a little patchlet), but I keep getting distracted with other stuff :/

OK.

Does it make sense to adjust the period for both of them ?

Also, when creating a task clock event, passing 'pid=-1' to
sys_perf_event_open() doesn't really make sense, does it ?

Same with cpu clock and 'pid=n': whatever <n> value, the event measure
the cpu wall time clock.

Perhaps proposing only one clock in the API and internally bind this
clock to the cpu or task clock depending on pid or cpu parameters would
have been better ?

Something like the following:


diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bb1884c..ad50551 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -105,6 +105,7 @@ enum perf_sw_ids {
PERF_COUNT_SW_PAGE_FAULTS_MAJ = 6,
PERF_COUNT_SW_ALIGNMENT_FAULTS = 7,
PERF_COUNT_SW_EMULATION_FAULTS = 8,
+ PERF_COUNT_SW_CLOCK = 9,

PERF_COUNT_SW_MAX, /* non-ABI */
};
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 1dabb54..f3ff342 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -4981,7 +4981,14 @@ static int cpu_clock_event_init(struct perf_event *event)
if (event->attr.type != PERF_TYPE_SOFTWARE)
return -ENOENT;

- if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
+ switch (event->attr.config) {
+ case PERF_COUNT_SW_CPU_CLOCK:
+ break;
+ case PERF_COUNT_SW_CLOCK:
+ if (!(event->attach_state & PERF_ATTACH_TASK))
+ break;
+ /* fall-through */
+ default:
return -ENOENT;

return 0;
@@ -5058,8 +5065,16 @@ static int task_clock_event_init(struct perf_event *event)
if (event->attr.type != PERF_TYPE_SOFTWARE)
return -ENOENT;

- if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
+ switch (event->attr.config) {
+ case PERF_COUNT_SW_TASK_CLOCK:
+ break;
+ case PERF_COUNT_SW_CLOCK:
+ if (event->attach_state & PERF_ATTACH_TASK)
+ break;
+ /* fall-through */
+ default:
return -ENOENT;
+ }

return 0;
}


--
Franck

2010-11-30 22:44:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: some questions about perf software events

On Sat, 2010-11-27 at 14:28 +0100, Franck Bui-Huu wrote:
> Peter Zijlstra <[email protected]> writes:
>
> > On Wed, 2010-11-24 at 12:35 +0100, Franck Bui-Huu wrote:
>
> [...]
>
> >> That is for no 'contiguous' events, setting a sampling frequency doesn't
> >> really make sense since for example you could set a frequency to 1000 HZ
> >> for the software ALIGNMENT_FAULT event and never get any samplings or at
> >> least getting sampling but with a totally different rate. And the
> >> current code doesn't look to handle sample_freq anyway.
> >
> > All the freq bits are in the generic code, it re-computes the rate on
> > the timer-tick as well as on each event occurrence.
> >
> > Freq driven sampling should work just fine with swevents.
> >
>
> Yes, but how does it behave with ALIGNMENT_FAULTS for example ?
>
> Such event may happen at a very disparate rate or it can even never
> happen at all.

Then of course we'll never make the freq target, again, software events
aren't special there.

> >
> >> Also I'm currently not seeing any real differences between cpu-clock and
> >> task-clock events. They both seem to count the time elapsed when the
> >> task is running on a CPU. Am I wrong ?
> >
> > No, Francis already noticed that, I probably wrecked it when I added the
> > multi-pmu stuff, its on my todo list to look at (Francis also handed me
> > a little patchlet), but I keep getting distracted with other stuff :/
>
> OK.
>
> Does it make sense to adjust the period for both of them ?
>
> Also, when creating a task clock event, passing 'pid=-1' to
> sys_perf_event_open() doesn't really make sense, does it ?
>
> Same with cpu clock and 'pid=n': whatever <n> value, the event measure
> the cpu wall time clock.
>
> Perhaps proposing only one clock in the API and internally bind this
> clock to the cpu or task clock depending on pid or cpu parameters would
> have been better ?
>

No, it actually makes sense to count both cpu and task clock on a task
(cpu clock basically being wall-time).

2010-12-02 20:52:25

by Franck Bui-Huu

[permalink] [raw]
Subject: Re: perf: some questions about perf software events

Peter Zijlstra <[email protected]> writes:

> On Sat, 2010-11-27 at 14:28 +0100, Franck Bui-Huu wrote:

[...]

>>
>> Does it make sense to adjust the period for both of them ?
>>
>> Also, when creating a task clock event, passing 'pid=-1' to
>> sys_perf_event_open() doesn't really make sense, does it ?
>>
>> Same with cpu clock and 'pid=n': whatever <n> value, the event measure
>> the cpu wall time clock.
>>
>> Perhaps proposing only one clock in the API and internally bind this
>> clock to the cpu or task clock depending on pid or cpu parameters would
>> have been better ?
>>
>
> No, it actually makes sense to count both cpu and task clock on a task
> (cpu clock basically being wall-time).
>

But a task can create several instances of the same events, no ?

For HW events, they'll use counters that support the type of these
events and if there are not enough of them then those events will share
the counters in a round robin fashion.

For SW events, there's no limit at all.

So doing:

attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_ClOCK;
/* ... */
tsk_clock_fd = sys_perf_event_open(&attr, 0, -1, -1, 0);
cpu_clock_fd = sys_perf_event_open(&attr, -1, 0, -1, 0);

should be allowed.

No ?
--
Franck