2010-06-11 16:30:00

by Peter Zijlstra

[permalink] [raw]
Subject: perf_disable()

Hi,

I've been going over perf_disable() usage in kernel/perf_event.c and
wondered if we actually need it at all.

Currently the only thing we seem to require it for is around pmu::enable
calls (and for that powerpc at least does it itself, on x86 we rely on
it to call ->enable_all and reprogram the pmu state).

But I can't really find any NMI races wrt data structures or the like as
seems implied by some comments.

There is a fun little recursion issue with perf_adjust_period(), where
if we fully removed perf_disable() we could end up calling pmu::stop()
twice and such.

But aside from that it looks to me its mostly about optimizing hardware
writes.

If nobody else known about/can find anything, I'm going to mostly remove
perf_disable() for now and later think about how to optimize the
hardware writes again.


Subject: Re: perf_disable()

On 11.06.10 12:29:44, Peter Zijlstra wrote:
> I've been going over perf_disable() usage in kernel/perf_event.c and
> wondered if we actually need it at all.
>
> Currently the only thing we seem to require it for is around pmu::enable
> calls (and for that powerpc at least does it itself, on x86 we rely on
> it to call ->enable_all and reprogram the pmu state).
>
> But I can't really find any NMI races wrt data structures or the like as
> seems implied by some comments.

Yes, it was originally used to disable nmis for some critical sections
in the non-arch code. I do not remember where this was exactly needed,
but my feeling is also this can be optimized and maybe reimplemented
as non-locking code.

We also should avoid the enable_all/disable_all() functions in the x86
implementation as it is expensive on some pmus (namely AMD). It looks
like these functions can be removed then too, or at least made model
specific only.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2010-06-11 17:17:36

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: perf_disable()

On Fri, Jun 11, 2010 at 06:29:44PM +0200, Peter Zijlstra wrote:
> Hi,
>
> I've been going over perf_disable() usage in kernel/perf_event.c and
> wondered if we actually need it at all.
>
> Currently the only thing we seem to require it for is around pmu::enable
> calls (and for that powerpc at least does it itself, on x86 we rely on
> it to call ->enable_all and reprogram the pmu state).
>
> But I can't really find any NMI races wrt data structures or the like as
> seems implied by some comments.



I suspect the problem is also on per context integrity. When you adjust
the period, enable or disable a counter, this counter becomes async with
the rest of the group or the rest of the counters in the same context, for
a small bunch of time.

The longer you run your events, the higher is going to be this jitter.

Take an example, when you adjust a period, you:

perf_disable()
perf_event_stop()
left_period = 0
perf_event_start()
perf_enable()

During all this time, the given event is paused, but the whole rest of
the events running on the cpu continue to count.

The problem is the same on context switch.

And I think this high resolution of synchronisation per context is
sensitive, especially with perf start kind of workflows.

(Although software events are not touched by perf_enable()/perf_disable().


>
> There is a fun little recursion issue with perf_adjust_period(), where
> if we fully removed perf_disable() we could end up calling pmu::stop()
> twice and such.
>
> But aside from that it looks to me its mostly about optimizing hardware
> writes.
>
> If nobody else known about/can find anything, I'm going to mostly remove
> perf_disable() for now and later think about how to optimize the
> hardware writes again.


Not sure that's a good idea IMHO.

2010-06-11 20:29:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_disable()

On Fri, 2010-06-11 at 19:17 +0200, Frederic Weisbecker wrote:
> On Fri, Jun 11, 2010 at 06:29:44PM +0200, Peter Zijlstra wrote:
> > Hi,
> >
> > I've been going over perf_disable() usage in kernel/perf_event.c and
> > wondered if we actually need it at all.
> >
> > Currently the only thing we seem to require it for is around pmu::enable
> > calls (and for that powerpc at least does it itself, on x86 we rely on
> > it to call ->enable_all and reprogram the pmu state).
> >
> > But I can't really find any NMI races wrt data structures or the like as
> > seems implied by some comments.
>
>
>
> I suspect the problem is also on per context integrity. When you adjust
> the period, enable or disable a counter, this counter becomes async with
> the rest of the group or the rest of the counters in the same context, for
> a small bunch of time.
>
> The longer you run your events, the higher is going to be this jitter.
>
> Take an example, when you adjust a period, you:
>
> perf_disable()
> perf_event_stop()
> left_period = 0
> perf_event_start()
> perf_enable()
>
> During all this time, the given event is paused, but the whole rest of
> the events running on the cpu continue to count.
>
> The problem is the same on context switch.
>
> And I think this high resolution of synchronisation per context is
> sensitive, especially with perf start kind of workflows.

I'm not sure what you're arguing, but the knife cuts on both sides, the
above also stops counters that shouldn't be stopped..

> > There is a fun little recursion issue with perf_adjust_period(), where
> > if we fully removed perf_disable() we could end up calling pmu::stop()
> > twice and such.
> >
> > But aside from that it looks to me its mostly about optimizing hardware
> > writes.
> >
> > If nobody else known about/can find anything, I'm going to mostly remove
> > perf_disable() for now and later think about how to optimize the
> > hardware writes again.
>
>
> Not sure that's a good idea IMHO.

Well, we need to do something, the current weak mess needs to go, and
the alternative is basically a loop over all registerd pmus calling
their respective pmu::disable_all, which is utter suckage, so removing
as many of this as possible is a good thing.

We can always come up with some lazy mode later that tries to batch the
hardware writes.

2010-06-11 21:01:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: perf_disable()

On Fri, Jun 11, 2010 at 10:29:40PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-06-11 at 19:17 +0200, Frederic Weisbecker wrote:
> > I suspect the problem is also on per context integrity. When you adjust
> > the period, enable or disable a counter, this counter becomes async with
> > the rest of the group or the rest of the counters in the same context, for
> > a small bunch of time.
> >
> > The longer you run your events, the higher is going to be this jitter.
> >
> > Take an example, when you adjust a period, you:
> >
> > perf_disable()
> > perf_event_stop()
> > left_period = 0
> > perf_event_start()
> > perf_enable()
> >
> > During all this time, the given event is paused, but the whole rest of
> > the events running on the cpu continue to count.
> >
> > The problem is the same on context switch.
> >
> > And I think this high resolution of synchronisation per context is
> > sensitive, especially with perf start kind of workflows.
>
> I'm not sure what you're arguing, but the knife cuts on both sides, the
> above also stops counters that shouldn't be stopped..


Hmm, now that I look at it, x86_pmu_*able_all() isn't touching a
single register to *able everything at once, it's doing a loop
on every events.

Forget about what I said then.


> > > There is a fun little recursion issue with perf_adjust_period(), where
> > > if we fully removed perf_disable() we could end up calling pmu::stop()
> > > twice and such.
> > >
> > > But aside from that it looks to me its mostly about optimizing hardware
> > > writes.
> > >
> > > If nobody else known about/can find anything, I'm going to mostly remove
> > > perf_disable() for now and later think about how to optimize the
> > > hardware writes again.
> >
> >
> > Not sure that's a good idea IMHO.
>
> Well, we need to do something, the current weak mess needs to go, and
> the alternative is basically a loop over all registerd pmus calling
> their respective pmu::disable_all, which is utter suckage, so removing
> as many of this as possible is a good thing.
>
> We can always come up with some lazy mode later that tries to batch the
> hardware writes.


Right.

2010-06-11 21:04:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: perf_disable()

On Fri, Jun 11, 2010 at 06:29:44PM +0200, Peter Zijlstra wrote:
> There is a fun little recursion issue with perf_adjust_period(), where
> if we fully removed perf_disable() we could end up calling pmu::stop()
> twice and such.



We can have a local_t made nesting level on the stop/start that could easily
deal with this.

2010-06-11 21:25:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf_disable()

On Fri, 2010-06-11 at 23:04 +0200, Frederic Weisbecker wrote:
> On Fri, Jun 11, 2010 at 06:29:44PM +0200, Peter Zijlstra wrote:
> > There is a fun little recursion issue with perf_adjust_period(), where
> > if we fully removed perf_disable() we could end up calling pmu::stop()
> > twice and such.
>
>
>
> We can have a local_t made nesting level on the stop/start that could easily
> deal with this.

Fanning it out to the relevant pmu implementation is the interesting
part..

2010-06-14 09:23:31

by Will Deacon

[permalink] [raw]
Subject: Re: perf_disable()

Hi Peter,

On Fri, 2010-06-11 at 17:29 +0100, Peter Zijlstra wrote:
> Hi,
>
> I've been going over perf_disable() usage in kernel/perf_event.c and
> wondered if we actually need it at all.

-------->8--------

> If nobody else known about/can find anything, I'm going to mostly remove
> perf_disable() for now and later think about how to optimize the
> hardware writes again.

No objections from the ARM camp as we don't use perf_disable() for
anything anyway.

Will