2009-06-13 01:27:19

by Corey Ashford

[permalink] [raw]
Subject: perf_counters: page fault trace record

Hi,

One of the tools we are working on needs to be able to look not only at counts
of page faults, but where they are occurring (ip and faulting page address).

What would you think about adding a new bit to the config record, something like:

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 6e13395..c27d0bc 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -167,8 +167,9 @@ struct perf_counter_attr {
mmap : 1, /* include mmap data */
comm : 1, /* include comm data */
freq : 1, /* use freq, not period */
-
- __reserved_2 : 53;
+ page_fault : 1, /* include page fault data
*/
+
+ __reserved_2 : 52;

We'd need a new event type too - PERF_EVENT_PAGE_FAULT which would have:

/*
* struct {
* struct perf_event_header header;
* u64 ip;
* u64 fault_address;
* };
*/

etc.

I would guess that special care would need to be taken to post an event record
like this on the thread of a page fault handler.

Any objection to this idea?



Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
[email protected]


2009-06-13 03:44:13

by Paul Mackerras

[permalink] [raw]
Subject: Re: perf_counters: page fault trace record

Corey Ashford writes:

> One of the tools we are working on needs to be able to look not only at counts
> of page faults, but where they are occurring (ip and faulting page address).
>
> What would you think about adding a new bit to the config record, something like:

Can't you do what you need just using a page fault software counter
with sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_ADDR and
sample_period = 1?

Paul.

2009-06-13 07:04:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf_counters: page fault trace record


* Paul Mackerras <[email protected]> wrote:

> Corey Ashford writes:
>
> > One of the tools we are working on needs to be able to look not only at counts
> > of page faults, but where they are occurring (ip and faulting page address).
> >
> > What would you think about adding a new bit to the config record, something like:
>
> Can't you do what you need just using a page fault software
> counter with sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_ADDR and
> sample_period = 1?

Yes, we should use a generic 'data address' field. A whole lot of
interesting events can live with that - only full-blown generic
tracepoints have more complex sample record formats.

Also, if this feature is added to pagefaults, please also add
support for it into tools/perf/, so that it gets tested/used.

Thanks,

Ingo

2009-06-13 08:45:00

by Corey Ashford

[permalink] [raw]
Subject: Re: perf_counters: page fault trace record

Paul Mackerras wrote:
> Corey Ashford writes:
>
>> One of the tools we are working on needs to be able to look not only at counts
>> of page faults, but where they are occurring (ip and faulting page address).
>>
>> What would you think about adding a new bit to the config record, something like:
>
> Can't you do what you need just using a page fault software counter
> with sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_ADDR and
> sample_period = 1?

I thought about that, but I was under the (incorrect?) impression that
on Power, the PERF_SAMPLE_ADDR would be set by the value of the SDAR
register, which wouldn't be correct for the case of a page fault.

I need to go look at the kernel code :)

- Corey

2009-06-13 09:18:10

by Paul Mackerras

[permalink] [raw]
Subject: Re: perf_counters: page fault trace record

Corey Ashford writes:

> Paul Mackerras wrote:
> > Can't you do what you need just using a page fault software counter
> > with sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_ADDR and
> > sample_period = 1?
>
> I thought about that, but I was under the (incorrect?) impression that
> on Power, the PERF_SAMPLE_ADDR would be set by the value of the SDAR
> register, which wouldn't be correct for the case of a page fault.

No, the PERF_SAMPLE_ADDR value only comes from SDAR for a hardware
counter overflow event. For the page-fault software counter the
PERF_SAMPLE_ADDR value will always be the faulting address.

Paul.

2009-06-13 10:09:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf_counters: page fault trace record


* Paul Mackerras <[email protected]> wrote:

> Corey Ashford writes:
>
> > Paul Mackerras wrote:
> > > Can't you do what you need just using a page fault software
> > > counter with sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_ADDR
> > > and sample_period = 1?
> >
> > I thought about that, but I was under the (incorrect?)
> > impression that on Power, the PERF_SAMPLE_ADDR would be set by
> > the value of the SDAR register, which wouldn't be correct for
> > the case of a page fault.
>
> No, the PERF_SAMPLE_ADDR value only comes from SDAR for a hardware
> counter overflow event. For the page-fault software counter the
> PERF_SAMPLE_ADDR value will always be the faulting address.

Corey, could you please add support for it in 'perf'? We dont want
such sw-counter features to be in the kernel code without matching
support in tools/perf/.

While user data symbols wont be resolved, if we have a
--target-address switch in perf record we could see the faulting
frequency (and the fault coverage - and ordering as well) of shared
libraries, in perf report and perf annotate.

This would be a very useful facility.

Thanks,

Ingo

2009-06-15 20:38:59

by Corey Ashford

[permalink] [raw]
Subject: Re: perf_counters: page fault trace record

Ingo Molnar wrote:
> * Paul Mackerras <[email protected]> wrote:
>
>> Corey Ashford writes:
>>
>>> Paul Mackerras wrote:
>>>> Can't you do what you need just using a page fault software
>>>> counter with sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_ADDR
>>>> and sample_period = 1?
>>> I thought about that, but I was under the (incorrect?)
>>> impression that on Power, the PERF_SAMPLE_ADDR would be set by
>>> the value of the SDAR register, which wouldn't be correct for
>>> the case of a page fault.
>> No, the PERF_SAMPLE_ADDR value only comes from SDAR for a hardware
>> counter overflow event. For the page-fault software counter the
>> PERF_SAMPLE_ADDR value will always be the faulting address.
>
> Corey, could you please add support for it in 'perf'? We dont want
> such sw-counter features to be in the kernel code without matching
> support in tools/perf/.
>
> While user data symbols wont be resolved, if we have a
> --target-address switch in perf record we could see the faulting
> frequency (and the fault coverage - and ordering as well) of shared
> libraries, in perf report and perf annotate.
>
> This would be a very useful facility.
>
> Thanks,
>
> Ingo

I'll see what I can do. This will be my first time modifying perf, so expect
that I won't do things the way you prefer the first couple of go-arounds :)

Regards,

- Corey