2007-11-09 10:46:32

by Stephane Eranian

[permalink] [raw]
Subject: conflict between tickless and perfmon2

Hello,

We have identified a conflict between TICKLESS (CONFIG_NO_HZ) and
the current perfmon2 implementation. The problem impacts system-wide
sessions using timeout-based event set multiplexing.

Event set multiplexing allows monitoring tools to measure more events
than there are actual performance counters on the processor. Events
are grouped in sets which are then multiplexed onto the actual counters.
Switching can be triggered either by a timeout or by a counter overflow.
This is supported for per-thread and system-wide sessions.

For timeout-based switching, the duration expressed in nanoseconds is
meant to represent wall-clock time in system-wide mode, and execution
time in per-thread mode. Granularity is limited by HZ.

The current implementation for timeout is a simple hook on the timer
interrupt path in apic_*.c:smp_local_timer_interrupt(). Unfortunately,
this does not work when tickless is enabled: we get much less set
switches than expected on an idle system.

It looks like a solution would be to change the implementation of
timeout-based switching to use HR timers instead. Similar to what is
done for ITIMER_REAL and ITIMER_VIRTUAL.

Unless someone has a better proposal, I will experiment with this on
2.6.24-rc2.

Thanks.

--
-Stephane


2007-11-09 11:06:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: conflict between tickless and perfmon2

On Fri, 2007-11-09 at 02:44 -0800, Stephane Eranian wrote:
> Hello,
>
> We have identified a conflict between TICKLESS (CONFIG_NO_HZ) and
> the current perfmon2 implementation. The problem impacts system-wide
> sessions using timeout-based event set multiplexing.
>
> Event set multiplexing allows monitoring tools to measure more events
> than there are actual performance counters on the processor. Events
> are grouped in sets which are then multiplexed onto the actual counters.
> Switching can be triggered either by a timeout or by a counter overflow.
> This is supported for per-thread and system-wide sessions.
>
> For timeout-based switching, the duration expressed in nanoseconds is
> meant to represent wall-clock time in system-wide mode, and execution
> time in per-thread mode. Granularity is limited by HZ.
>
> The current implementation for timeout is a simple hook on the timer
> interrupt path in apic_*.c:smp_local_timer_interrupt(). Unfortunately,
> this does not work when tickless is enabled: we get much less set
> switches than expected on an idle system.
>
> It looks like a solution would be to change the implementation of
> timeout-based switching to use HR timers instead. Similar to what is
> done for ITIMER_REAL and ITIMER_VIRTUAL.
>
> Unless someone has a better proposal, I will experiment with this on
> 2.6.24-rc2.

Might help if you CC the tickless folks :-)

2007-11-09 18:47:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: conflict between tickless and perfmon2

On Fri, 9 Nov 2007, Peter Zijlstra wrote:

> On Fri, 2007-11-09 at 02:44 -0800, Stephane Eranian wrote:
> > Hello,
> >
> > We have identified a conflict between TICKLESS (CONFIG_NO_HZ) and
> > the current perfmon2 implementation. The problem impacts system-wide
> > sessions using timeout-based event set multiplexing.
> >
> > Event set multiplexing allows monitoring tools to measure more events
> > than there are actual performance counters on the processor. Events
> > are grouped in sets which are then multiplexed onto the actual counters.
> > Switching can be triggered either by a timeout or by a counter overflow.
> > This is supported for per-thread and system-wide sessions.
> >
> > For timeout-based switching, the duration expressed in nanoseconds is
> > meant to represent wall-clock time in system-wide mode, and execution
> > time in per-thread mode. Granularity is limited by HZ.
> >
> > The current implementation for timeout is a simple hook on the timer
> > interrupt path in apic_*.c:smp_local_timer_interrupt(). Unfortunately,
> > this does not work when tickless is enabled: we get much less set
> > switches than expected on an idle system.

What a surprise. :)

> > It looks like a solution would be to change the implementation of
> > timeout-based switching to use HR timers instead. Similar to what is
> > done for ITIMER_REAL and ITIMER_VIRTUAL.

Using a hrtimer is perfrectly fine, I'd say it's preferred over hooks in
some code which has absoluty no guarantee of being executed periodically
or even executed at all. OTOH it seems rather stupid to measure stuff
while the system is idle and doing nothing.

tglx

2007-11-09 20:24:49

by Stephane Eranian

[permalink] [raw]
Subject: Re: conflict between tickless and perfmon2

Thomas,

On Fri, Nov 09, 2007 at 07:40:31PM +0100, Thomas Gleixner wrote:
>
> > > It looks like a solution would be to change the implementation of
> > > timeout-based switching to use HR timers instead. Similar to what is
> > > done for ITIMER_REAL and ITIMER_VIRTUAL.
>
> Using a hrtimer is perfrectly fine, I'd say it's preferred over hooks in
> some code which has absoluty no guarantee of being executed periodically
> or even executed at all. OTOH it seems rather stupid to measure stuff
> while the system is idle and doing nothing.
>
I'll start looking into this soon. To answer your point about idle,
this is not because the core is idle that counters do not capture events
related to buses or caches for instance.

--
-Stephane

2007-11-14 16:40:49

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perfmon] Re: conflict between tickless and perfmon2

Thomas,

On Fri, Nov 09, 2007 at 07:40:31PM +0100, Thomas Gleixner wrote:
> On Fri, 9 Nov 2007, Peter Zijlstra wrote:
>
>
> > > It looks like a solution would be to change the implementation of
> > > timeout-based switching to use HR timers instead. Similar to what is
> > > done for ITIMER_REAL and ITIMER_VIRTUAL.
>
> Using a hrtimer is perfrectly fine, I'd say it's preferred over hooks in
> some code which has absoluty no guarantee of being executed periodically
> or even executed at all. OTOH it seems rather stupid to measure stuff
> while the system is idle and doing nothing.
>
I managed to switch the perfmon2 code to use hrtimer(CLOCK_MONOTONIC)
for system-wide (per-cpu) measurements. The code is simple and this allowed
me to do some more cleanups. I think this was a good suggestion and I made
the change rapidly.

Now, I must admit I don't quite understand how to make this work for per-thread
measurements where the timer would have to operate like ITIMER_VIRTUAL,i.e., only
run when the thread runs. I looked at the setitimer() code and I admit it is
not clear to me. What about CLOCK_THREAD_CPUTIME_ID, would it do what I need from
inside the kernel?

Thanks.

--
-Stephane

2007-11-23 14:37:05

by Stephane Eranian

[permalink] [raw]
Subject: Re: conflict between tickless and perfmon2

Hello,

On Fri, Nov 09, 2007 at 07:40:31PM +0100, Thomas Gleixner wrote:
> On Fri, 9 Nov 2007, Peter Zijlstra wrote:
>
> > On Fri, 2007-11-09 at 02:44 -0800, Stephane Eranian wrote:
> > > Hello,
> > >
> > > We have identified a conflict between TICKLESS (CONFIG_NO_HZ) and
> > > the current perfmon2 implementation. The problem impacts system-wide
> > > sessions using timeout-based event set multiplexing.
> > >
> > > Event set multiplexing allows monitoring tools to measure more events
> > > than there are actual performance counters on the processor. Events
> > > are grouped in sets which are then multiplexed onto the actual counters.
> > > Switching can be triggered either by a timeout or by a counter overflow.
> > > This is supported for per-thread and system-wide sessions.
> > >
> > > For timeout-based switching, the duration expressed in nanoseconds is
> > > meant to represent wall-clock time in system-wide mode, and execution
> > > time in per-thread mode. Granularity is limited by HZ.
> > >
>
> Using a hrtimer is perfrectly fine, I'd say it's preferred over hooks in
> some code which has absoluty no guarantee of being executed periodically
> or even executed at all. OTOH it seems rather stupid to measure stuff
> while the system is idle and doing nothing.
>

I have now converted the timeout-based set mtuliplexing to use hrtimer instead.
The patch is available from the perfmon2 GIT tree on kernel.org.

with this patch, multiplexing works with tickless kernels for system-wide
sessions. All the arch specific hooks are gone.

For system-wide, the timeout is measurement wall-clock time. For per-thread,
it is measuring virtual time. I could not find a way to count virtual time
with hrtimer. Thus I ended up using a hrtimer/cpu and cancel/restore timeout
on context switch. I suspect there may be a better way of doing this but for
now it seems to work.

With this patch, timeout-bsed multiplexing should work on all arch. I have
test on i386. x86-64, ia64. Please try the other ones as well.

Couple of interfaces changes related to this patch:

- switch timeout is only running between pfm_start/pfm_stop calls and when the
context is not masked due to sampling overflows. It used to be running between
pfm_load_contex/pfm_unload_context. This means that on architectures which allow
start/stop for user level (e.g., IA-64), it is now necessary to call pfm_start and
pfm_stop when using multiple sets. It is not really practical to combine set
switching in the kernel with user level direct reading of the registers.

- pfm_create_evtsets() fails if the timeout is not a multiple of the clock resolution.
Using clock_getres(CLOCK_MONOTONIC) users can figure out the granularity and adjust
the timeout accordingly.


--
-Stephane