2010-02-12 03:02:13

by Paul Mackerras

[permalink] [raw]
Subject: Why is PERF_FORMAT_GROUP incompatible with inherited events?

We currently have this code in perf_event_alloc() in kernel/perf_event.c:

/*
* we currently do not support PERF_FORMAT_GROUP on inherited events
*/
if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
goto done;

plus there is a comment "XXX PERF_FORMAT_GROUP vs inherited events
seems difficult" next to perf_output_read_group() (but there isn't a
similar comment on perf_read_hw()).

First, what is the difficulty referred to here?

Secondly, if the difficulty is just to do with the intersection of
sampling counters, inheritance, and group readout (as seems to be the
case), could we please allow group readout on ordinary counting
(non-sampling) counters? That is, change the test above to something
like:

if (attr->inherit && attr->sample_period &&
(attr->read_format & PERF_FORMAT_GROUP))
goto done;

Any objections to that change? If it's OK, could we get it into .33
and .32-stable?

Paul.


2010-02-14 10:12:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Why is PERF_FORMAT_GROUP incompatible with inherited events?

On Fri, 2010-02-12 at 14:02 +1100, Paul Mackerras wrote:
> We currently have this code in perf_event_alloc() in kernel/perf_event.c:
>
> /*
> * we currently do not support PERF_FORMAT_GROUP on inherited events
> */
> if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
> goto done;
>
> plus there is a comment "XXX PERF_FORMAT_GROUP vs inherited events
> seems difficult" next to perf_output_read_group() (but there isn't a
> similar comment on perf_read_hw()).
>
> First, what is the difficulty referred to here?

IIRC its the fact that we have to go collect the count delta from all
the child counters, which can be quite a lot of work depending on the
number of cpus and children around.

> Secondly, if the difficulty is just to do with the intersection of
> sampling counters, inheritance, and group readout (as seems to be the
> case), could we please allow group readout on ordinary counting
> (non-sampling) counters? That is, change the test above to something
> like:
>
> if (attr->inherit && attr->sample_period &&
> (attr->read_format & PERF_FORMAT_GROUP))
> goto done;
>
> Any objections to that change? If it's OK, could we get it into .33
> and .32-stable?

Yeah, that's still broken, you can't do a read without collecting all
the child counts.

2010-02-14 11:33:22

by Paul Mackerras

[permalink] [raw]
Subject: Re: Why is PERF_FORMAT_GROUP incompatible with inherited events?

On Sun, Feb 14, 2010 at 11:12:17AM +0100, Peter Zijlstra wrote:
> On Fri, 2010-02-12 at 14:02 +1100, Paul Mackerras wrote:
> > We currently have this code in perf_event_alloc() in kernel/perf_event.c:
> >
> > /*
> > * we currently do not support PERF_FORMAT_GROUP on inherited events
> > */
> > if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
> > goto done;
> >
> > plus there is a comment "XXX PERF_FORMAT_GROUP vs inherited events
> > seems difficult" next to perf_output_read_group() (but there isn't a
> > similar comment on perf_read_hw()).
> >
> > First, what is the difficulty referred to here?
>
> IIRC its the fact that we have to go collect the count delta from all
> the child counters, which can be quite a lot of work depending on the
> number of cpus and children around.

But we don't go and collect the count delta from children without
PERF_FORMAT_GROUP, so why would we with it?

There are two situations where PERF_FORMAT_GROUP makes a difference:
with PERF_SAMPLE_READ when storing a sample in the ring buffer, and
when you do a read() system call on a perf_event fd. In both
situations, if the counter is inherited, we don't go collecting up
child counts, we just store the value of the counter that overflowed
in the sampling case, or the value of the top-level counter in the
read() case.

Now, I can see a possible difficulty in the sampling case if you have
a group that has some inherited members and some non-inherited
members. In that case if you get an overflow on a child counter, the
group it's in will have fewer members that the group that the
top-level counter is part of, which could get confusing. But there is
no such problem for read() since it is always returning the value of
the top-level counter.

> > Secondly, if the difficulty is just to do with the intersection of
> > sampling counters, inheritance, and group readout (as seems to be the
> > case), could we please allow group readout on ordinary counting
> > (non-sampling) counters? That is, change the test above to something
> > like:
> >
> > if (attr->inherit && attr->sample_period &&
> > (attr->read_format & PERF_FORMAT_GROUP))
> > goto done;
> >
> > Any objections to that change? If it's OK, could we get it into .33
> > and .32-stable?
>
> Yeah, that's still broken, you can't do a read without collecting all
> the child counts.

We do a read without collecting all the child counts if
PERF_FORMAT_GROUP is not set -- why would that be any different when
PERF_FORMAT_GROUP is set? PERF_FORMAT_GROUP is about the "horizontal"
dimension (across group members) not the "vertical" dimension (down to
all the child counters).

Paul.

2010-02-14 12:38:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Why is PERF_FORMAT_GROUP incompatible with inherited events?

On Sun, 2010-02-14 at 22:33 +1100, Paul Mackerras wrote:
>
> But we don't go and collect the count delta from children without
> PERF_FORMAT_GROUP, so why would we with it?

Yes we do, see perf_event_read_value().

But now that I look at it we don't seem to do so in
perf_output_read_one()... I guess we should fix that.

There is of course the lock inversion in the .read() code reported by
stephane, but other than that is seems to actually support inherited &&
group just fine.

So I think if we fix that lock inversion and make the PERF_SAMPLE_READ
code look like the .read() code it should all work out.


2010-02-15 04:56:52

by Paul Mackerras

[permalink] [raw]
Subject: Re: Why is PERF_FORMAT_GROUP incompatible with inherited events?

On Sun, Feb 14, 2010 at 01:38:27PM +0100, Peter Zijlstra wrote:
> On Sun, 2010-02-14 at 22:33 +1100, Paul Mackerras wrote:
> >
> > But we don't go and collect the count delta from children without
> > PERF_FORMAT_GROUP, so why would we with it?
>
> Yes we do, see perf_event_read_value().

Ah, true, I should have read the code more carefully.

> But now that I look at it we don't seem to do so in
> perf_output_read_one()... I guess we should fix that.

I suppose it should give the same value as read() would, but the
possibly unbounded interrupt latency is a bit of a worry. I can't
think of a way to avoid it, though (other than not using
PERF_SAMPLE_READ with inherited sampling events :).

> There is of course the lock inversion in the .read() code reported by
> stephane, but other than that is seems to actually support inherited &&
> group just fine.
>
> So I think if we fix that lock inversion and make the PERF_SAMPLE_READ
> code look like the .read() code it should all work out.

Cool.

Paul.

2010-02-15 09:29:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Why is PERF_FORMAT_GROUP incompatible with inherited events?

On Mon, 2010-02-15 at 15:56 +1100, Paul Mackerras wrote:
> On Sun, Feb 14, 2010 at 01:38:27PM +0100, Peter Zijlstra wrote:
> > On Sun, 2010-02-14 at 22:33 +1100, Paul Mackerras wrote:
> > >
> > > But we don't go and collect the count delta from children without
> > > PERF_FORMAT_GROUP, so why would we with it?
> >
> > Yes we do, see perf_event_read_value().
>
> Ah, true, I should have read the code more carefully.
>
> > But now that I look at it we don't seem to do so in
> > perf_output_read_one()... I guess we should fix that.
>
> I suppose it should give the same value as read() would, but the
> possibly unbounded interrupt latency is a bit of a worry. I can't
> think of a way to avoid it, though (other than not using
> PERF_SAMPLE_READ with inherited sampling events :).
>
> > There is of course the lock inversion in the .read() code reported by
> > stephane, but other than that is seems to actually support inherited &&
> > group just fine.
> >
> > So I think if we fix that lock inversion and make the PERF_SAMPLE_READ
> > code look like the .read() code it should all work out.

I now realize that this is going to be very complicated because it
involves sending IPIs from NMI context, which is rather involved.

So I might have meant:
attr->inherit && (attr->sample_format & PERF_SAMPLE_READ)

to be mutually exclusive.