2013-08-12 00:03:16

by Karim Yaghmour

[permalink] [raw]
Subject: Reading perf counters at ftrace trace boundaries


Wondering if there's a way for reading perf counters in the kernel. I'd
like to read/record perf counters on ftrace function tracing
entries/exits to provide a rundown of the value of various counters on
function call boundaries.

[ Steven: apologies for sending you a duplicate here of what I somewhat
already sent privately. ]

--
Karim Yaghmour
CEO - Opersys inc. / http://www.opersys.com
http://twitter.com/karimyaghmour


2013-08-12 01:23:12

by Andi Kleen

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries

Karim Yaghmour <[email protected]> writes:

> Wondering if there's a way for reading perf counters in the kernel. I'd
> like to read/record perf counters on ftrace function tracing
> entries/exits to provide a rundown of the value of various counters on
> function call boundaries.

KVM does it, see arch/x86/kvm/pmu.c. Essentially it would be doing RDPMC.

But the overhead will be likely very high, some sampling approach
is likely better.

-Andi

--
[email protected] -- Speaking for myself only

2013-08-12 01:39:11

by Karim Yaghmour

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries


On 13-08-11 10:23 PM, Andi Kleen wrote:
> KVM does it, see arch/x86/kvm/pmu.c. Essentially it would be doing RDPMC.

Thx for the pointer, appreciated.

> But the overhead will be likely very high, some sampling approach
> is likely better.

Indeed. It doesn't actually have to be at every single ftrace
begin/exit. But possibly starting with some kind of every nth and then
drilling down as the culprit is incrementally singled-out.

--
Karim Yaghmour
CEO - Opersys inc. / http://www.opersys.com
http://twitter.com/karimyaghmour

2013-08-12 01:47:53

by Andi Kleen

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries

> Indeed. It doesn't actually have to be at every single ftrace
> begin/exit. But possibly starting with some kind of every nth and then
> drilling down as the culprit is incrementally singled-out.

That's what normal sampling already does.

If you're worried about systematic shadow effects just randomize a bit.

-Andi

2013-08-12 02:00:11

by Karim Yaghmour

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries


On 13-08-11 10:47 PM, Andi Kleen wrote:
> That's what normal sampling already does.
>
> If you're worried about systematic shadow effects just randomize a bit.

That's actually the point. I'd like to be able to study/compare both
approaches. I could be completely off, but I'd like to see if a divide
and conquer approach (i.e. based on ftrace) wouldn't take the guesswork
out of smart randomization. Just a hunch.

--
Karim Yaghmour
CEO - Opersys inc. / http://www.opersys.com
http://twitter.com/karimyaghmour

2013-08-12 02:25:21

by zhangwei(Jovi)

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries

On 2013/8/12 8:03, Karim Yaghmour wrote:
>
> Wondering if there's a way for reading perf counters in the kernel. I'd
> like to read/record perf counters on ftrace function tracing
> entries/exits to provide a rundown of the value of various counters on
> function call boundaries.
>
> [ Steven: apologies for sending you a duplicate here of what I somewhat
> already sent privately. ]
>

If you want to base on ftrace, below two approach maybe take into use:

- register_ftrace_function/unregister_ftrace_function

- perf_event_create_kernel_counter (function event id is 1)

the first one is simplest, IMO.

You need to write your own kernel module to use these approach.

jovi.






2013-08-12 15:27:10

by Karim Yaghmour

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries


On 13-08-11 11:24 PM, zhangwei(Jovi) wrote:
> If you want to base on ftrace, below two approach maybe take into use:
>
> - register_ftrace_function/unregister_ftrace_function
>
> - perf_event_create_kernel_counter (function event id is 1)
>
> the first one is simplest, IMO.

Thx for the pointers.

> You need to write your own kernel module to use these approach.

As a proof-of-concept, sure. For something more permanent it would make
more sense to adapt the various perf/ftrace tools to make this available
on the command line with other options. But we're far away from that for
the moment.

--
Karim Yaghmour
CEO - Opersys inc. / http://www.opersys.com
http://twitter.com/karimyaghmour

2013-08-13 07:13:00

by zhangwei(Jovi)

[permalink] [raw]
Subject: Re: Reading perf counters at ftrace trace boundaries

On 2013/8/12 23:26, Karim Yaghmour wrote:
>
> On 13-08-11 11:24 PM, zhangwei(Jovi) wrote:
>> If you want to base on ftrace, below two approach maybe take into use:
>>
>> - register_ftrace_function/unregister_ftrace_function
>>
>> - perf_event_create_kernel_counter (function event id is 1)
>>
>> the first one is simplest, IMO.
>
> Thx for the pointers.
>
>> You need to write your own kernel module to use these approach.
>
> As a proof-of-concept, sure. For something more permanent it would make
> more sense to adapt the various perf/ftrace tools to make this available
> on the command line with other options. But we're far away from that for
> the moment.
>
If you want to embed pmu reading into ftrace/perf permanently, perhaps
make pmu reading as clock source would be a nice way to go, actually the
question you raised is very common for all of us in practical, sometimes
we want to read cpu cycles both in kvm host and kvm guest, then in that
kvm case, pmu reading for each event is very valuable.

(one issue need to handle specially for making pmu as tracing clock, is
the pmu number overflow, because it relate with ftrace per-cpu ring
buffer sync)

Thanks.

jovi