Hello,
As suggested by people on this list, I have changed perfmon2 to use
the high resolution
timers as the interface to allow timeout-based event set multiplexing.
This works around
the problems I had with tickless-enabled kernels.
Multiplexing is supported in per-thread as well. In that case, the
timeout measures virtual
time. When the thread is context switched out, we need to save the
remainder of the timeout
and cancel the timer. When the thread is context switched in, we need
to reinstall the timer.
These timer save/restore operations have to be done in the switch_to()
code near the end
of schedule().
There are situations where hrtimer_start() may end up trying to
acquire the runqueue lock.
This happens on a context switch where the current thread is blocking
(not preempted) and
the new timeout happens to be either in the past or just expiring.
We've run into such situations
with simple tests.
On all architectures, but IA-64, it seems thet the runqueue lock is
held until the end of schedule().
On IA-64, the lock is released BEFORE switch_to() for some reason I
don't quite remember. That
may not even be needed anymore.
The early unlocking is controlled by a macro named __ARCH_WANT_UNLOCKED_CTXSW.
Defining this macros on X86 (or PPC) fixed our problem.
It is not clear to me why the runqueue lock needs to be held up until
the end of schedule() on some
platforms and not on others. Not that releasing the lock earlier does
not necessarily introduce
more overhead because the lock is never re-acquired later in the
schedule() function.
Question:
- is it safe to release the lock before switch_to() on all architectures?
Thanks.
[ At the very least CC'ing the scheduler maintainer would be
helpful :-) ]
On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> Hello,
>
> As suggested by people on this list, I have changed perfmon2 to use
> the high resolution timers as the interface to allow timeout-based
> event set multiplexing. This works around the problems I had with
> tickless-enabled kernels.
>
> Multiplexing is supported in per-thread as well. In that case, the
> timeout measures virtual time. When the thread is context switched
> out, we need to save the remainder of the timeout and cancel the
> timer. When the thread is context switched in, we need to reinstall
> the timer. These timer save/restore operations have to be done in the
> switch_to() code near the end of schedule().
>
> There are situations where hrtimer_start() may end up trying to
> acquire the runqueue lock. This happens on a context switch where the
> current thread is blocking (not preempted) and the new timeout happens
> to be either in the past or just expiring. We've run into such
> situations with simple tests.
>
> On all architectures, but IA-64, it seems thet the runqueue lock is
> held until the end of schedule(). On IA-64, the lock is released
> BEFORE switch_to() for some reason I don't quite remember. That may
> not even be needed anymore.
>
> The early unlocking is controlled by a macro named
> __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> our problem.
>
> It is not clear to me why the runqueue lock needs to be held up until
> the end of schedule() on some platforms and not on others. Not that
> releasing the lock earlier does not necessarily introduce more
> overhead because the lock is never re-acquired later in the schedule()
> function.
>
> Question:
> - is it safe to release the lock before switch_to() on all architectures?
I had similar problem when using hrtimers from the scheduler, I extended
the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
unlocked.
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=commitdiff;h=7e7cbd617833dde5b442e03f69aac39d17d02ec7
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=commitdiff;h=45d10aad580a5cdd376e80848aeeaaaf1f97cc18
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=commitdiff;h=5ae5d6c5850d4735798bc0e4526d8c61199e9f93
As for your __ARCH_WANT_UNLOCKED_CTXSW question I have to defer to Ingo,
as I'm unaware of the arch ramifications there.
On Friday 18 January 2008 00:24, Peter Zijlstra wrote:
> [ At the very least CC'ing the scheduler maintainer would be
> helpful :-) ]
>
> On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> > Hello,
> >
> > As suggested by people on this list, I have changed perfmon2 to use
> > the high resolution timers as the interface to allow timeout-based
> > event set multiplexing. This works around the problems I had with
> > tickless-enabled kernels.
> >
> > Multiplexing is supported in per-thread as well. In that case, the
> > timeout measures virtual time. When the thread is context switched
> > out, we need to save the remainder of the timeout and cancel the
> > timer. When the thread is context switched in, we need to reinstall
> > the timer. These timer save/restore operations have to be done in the
> > switch_to() code near the end of schedule().
> >
> > There are situations where hrtimer_start() may end up trying to
> > acquire the runqueue lock. This happens on a context switch where the
> > current thread is blocking (not preempted) and the new timeout happens
> > to be either in the past or just expiring. We've run into such
> > situations with simple tests.
> >
> > On all architectures, but IA-64, it seems thet the runqueue lock is
> > held until the end of schedule(). On IA-64, the lock is released
> > BEFORE switch_to() for some reason I don't quite remember. That may
> > not even be needed anymore.
> >
> > The early unlocking is controlled by a macro named
> > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> > our problem.
> >
> > It is not clear to me why the runqueue lock needs to be held up until
> > the end of schedule() on some platforms and not on others. Not that
> > releasing the lock earlier does not necessarily introduce more
> > overhead because the lock is never re-acquired later in the schedule()
> > function.
> >
> > Question:
> > - is it safe to release the lock before switch_to() on all
> > architectures?
>
> I had similar problem when using hrtimers from the scheduler, I extended
> the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> unlocked.
>
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
>=commitdiff;h=7e7cbd617833dde5b442e03f69aac39d17d02ec7
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
>=commitdiff;h=45d10aad580a5cdd376e80848aeeaaaf1f97cc18
> http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
>=commitdiff;h=5ae5d6c5850d4735798bc0e4526d8c61199e9f93
>
> As for your __ARCH_WANT_UNLOCKED_CTXSW question I have to defer to Ingo,
> as I'm unaware of the arch ramifications there.
It is arch specific. If an architecture wants interrupts on during context
switch, or runqueue unlocked, then they set it (btw INTERRUPTS_ON_CTXSW
also implies UNLOCKED_CTXSW).
Although, eg on x86, you would hold off interrupts and runqueue lock for
slightly less time if you defined those, it results in _slightly_ more
complicated context switching... although I did once find a workload
where the reduced runqueue contention improved throughput a bit, it is
not much problem in general to hold the lock.
Nick,
On Jan 18, 2008 3:07 AM, Nick Piggin <[email protected]> wrote:
>
> On Friday 18 January 2008 00:24, Peter Zijlstra wrote:
> > [ At the very least CC'ing the scheduler maintainer would be
> > helpful :-) ]
> >
> > On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> > > Hello,
> > >
> > > As suggested by people on this list, I have changed perfmon2 to use
> > > the high resolution timers as the interface to allow timeout-based
> > > event set multiplexing. This works around the problems I had with
> > > tickless-enabled kernels.
> > >
> > > Multiplexing is supported in per-thread as well. In that case, the
> > > timeout measures virtual time. When the thread is context switched
> > > out, we need to save the remainder of the timeout and cancel the
> > > timer. When the thread is context switched in, we need to reinstall
> > > the timer. These timer save/restore operations have to be done in the
> > > switch_to() code near the end of schedule().
> > >
> > > There are situations where hrtimer_start() may end up trying to
> > > acquire the runqueue lock. This happens on a context switch where the
> > > current thread is blocking (not preempted) and the new timeout happens
> > > to be either in the past or just expiring. We've run into such
> > > situations with simple tests.
> > >
> > > On all architectures, but IA-64, it seems thet the runqueue lock is
> > > held until the end of schedule(). On IA-64, the lock is released
> > > BEFORE switch_to() for some reason I don't quite remember. That may
> > > not even be needed anymore.
> > >
> > > The early unlocking is controlled by a macro named
> > > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> > > our problem.
> > >
> > > It is not clear to me why the runqueue lock needs to be held up until
> > > the end of schedule() on some platforms and not on others. Not that
> > > releasing the lock earlier does not necessarily introduce more
> > > overhead because the lock is never re-acquired later in the schedule()
> > > function.
> > >
> > > Question:
> > > - is it safe to release the lock before switch_to() on all
> > > architectures?
> >
> > I had similar problem when using hrtimers from the scheduler, I extended
> > the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> > unlocked.
> >
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
> >=commitdiff;h=7e7cbd617833dde5b442e03f69aac39d17d02ec7
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
> >=commitdiff;h=45d10aad580a5cdd376e80848aeeaaaf1f97cc18
> > http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a
> >=commitdiff;h=5ae5d6c5850d4735798bc0e4526d8c61199e9f93
> >
> > As for your __ARCH_WANT_UNLOCKED_CTXSW question I have to defer to Ingo,
> > as I'm unaware of the arch ramifications there.
>
> It is arch specific. If an architecture wants interrupts on during context
> switch, or runqueue unlocked, then they set it (btw INTERRUPTS_ON_CTXSW
> also implies UNLOCKED_CTXSW).
>
Yes , I noticed that. I am only interested in UNLOCKED_CTXSW.
But it appears that the approach suggested my Peter does work. We are
running some tests.
> Although, eg on x86, you would hold off interrupts and runqueue lock for
> slightly less time if you defined those, it results in _slightly_ more
> complicated context switching... although I did once find a workload
> where the reduced runqueue contention improved throughput a bit, it is
> not much problem in general to hold the lock.
>
By complicated you mean that now you'd have to make sure you don't
need to access runqueue data?
Thanks.
On Friday 18 January 2008 17:33, stephane eranian wrote:
> Nick,
> > It is arch specific. If an architecture wants interrupts on during
> > context switch, or runqueue unlocked, then they set it (btw
> > INTERRUPTS_ON_CTXSW also implies UNLOCKED_CTXSW).
>
> Yes , I noticed that. I am only interested in UNLOCKED_CTXSW.
> But it appears that the approach suggested my Peter does work. We are
> running some tests.
OK, that might be OK.
> > Although, eg on x86, you would hold off interrupts and runqueue lock for
> > slightly less time if you defined those, it results in _slightly_ more
> > complicated context switching... although I did once find a workload
> > where the reduced runqueue contention improved throughput a bit, it is
> > not much problem in general to hold the lock.
>
> By complicated you mean that now you'd have to make sure you don't
> need to access runqueue data?
Well, not speaking about the arch-specific code (which may involve
more complexities), but the core scheduler needs the
task_struct->oncpu variable wheras that isn't required if the
runqueue is locked while switching tasks.
Peter,
> On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> > Hello,
> >
> > As suggested by people on this list, I have changed perfmon2 to use
> > the high resolution timers as the interface to allow timeout-based
> > event set multiplexing. This works around the problems I had with
> > tickless-enabled kernels.
> >
> > Multiplexing is supported in per-thread as well. In that case, the
> > timeout measures virtual time. When the thread is context switched
> > out, we need to save the remainder of the timeout and cancel the
> > timer. When the thread is context switched in, we need to reinstall
> > the timer. These timer save/restore operations have to be done in the
> > switch_to() code near the end of schedule().
> >
> > There are situations where hrtimer_start() may end up trying to
> > acquire the runqueue lock. This happens on a context switch where the
> > current thread is blocking (not preempted) and the new timeout happens
> > to be either in the past or just expiring. We've run into such
> > situations with simple tests.
> >
> > On all architectures, but IA-64, it seems thet the runqueue lock is
> > held until the end of schedule(). On IA-64, the lock is released
> > BEFORE switch_to() for some reason I don't quite remember. That may
> > not even be needed anymore.
> >
> > The early unlocking is controlled by a macro named
> > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> > our problem.
> >
> > It is not clear to me why the runqueue lock needs to be held up until
> > the end of schedule() on some platforms and not on others. Not that
> > releasing the lock earlier does not necessarily introduce more
> > overhead because the lock is never re-acquired later in the schedule()
> > function.
> >
> > Question:
> > - is it safe to release the lock before switch_to() on all architectures?
>
> I had similar problem when using hrtimers from the scheduler, I extended
> the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> unlocked.
>
I am running into an issue when enabling this flag. Basically, the
timer never fires
when it gets into this situation where in hrtimer_start() the timer
ends up being the
next one to fire. In this mode, hrtimer_enqueue_reprogram() become a NOP. But
then nobody never inserts the time into any queue. There is a comment that
says "caller site takes care of this". Could you elaborate on this?
Thanks.
On Sat, 2008-02-23 at 15:50 +0100, stephane eranian wrote:
> Peter,
>
> > On Wed, 2008-01-16 at 16:29 -0800, stephane eranian wrote:
> > > Hello,
> > >
> > > As suggested by people on this list, I have changed perfmon2 to use
> > > the high resolution timers as the interface to allow timeout-based
> > > event set multiplexing. This works around the problems I had with
> > > tickless-enabled kernels.
> > >
> > > Multiplexing is supported in per-thread as well. In that case, the
> > > timeout measures virtual time. When the thread is context switched
> > > out, we need to save the remainder of the timeout and cancel the
> > > timer. When the thread is context switched in, we need to reinstall
> > > the timer. These timer save/restore operations have to be done in the
> > > switch_to() code near the end of schedule().
> > >
> > > There are situations where hrtimer_start() may end up trying to
> > > acquire the runqueue lock. This happens on a context switch where the
> > > current thread is blocking (not preempted) and the new timeout happens
> > > to be either in the past or just expiring. We've run into such
> > > situations with simple tests.
> > >
> > > On all architectures, but IA-64, it seems thet the runqueue lock is
> > > held until the end of schedule(). On IA-64, the lock is released
> > > BEFORE switch_to() for some reason I don't quite remember. That may
> > > not even be needed anymore.
> > >
> > > The early unlocking is controlled by a macro named
> > > __ARCH_WANT_UNLOCKED_CTXSW. Defining this macros on X86 (or PPC) fixed
> > > our problem.
> > >
> > > It is not clear to me why the runqueue lock needs to be held up until
> > > the end of schedule() on some platforms and not on others. Not that
> > > releasing the lock earlier does not necessarily introduce more
> > > overhead because the lock is never re-acquired later in the schedule()
> > > function.
> > >
> > > Question:
> > > - is it safe to release the lock before switch_to() on all architectures?
> >
> > I had similar problem when using hrtimers from the scheduler, I extended
> > the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ time type to run with cpu_base->lock
> > unlocked.
> >
> I am running into an issue when enabling this flag. Basically, the
> timer never fires
> when it gets into this situation where in hrtimer_start() the timer
> ends up being the
> next one to fire. In this mode, hrtimer_enqueue_reprogram() become a NOP. But
> then nobody never inserts the time into any queue. There is a comment that
> says "caller site takes care of this". Could you elaborate on this?
That would mean the timer already expired by the time you get to program
it.
The way to handle these is:
for (;;) {
if (hrtimer_active(timer))
break;
now = hrtimer_cb_get_time(timer);
hrtimer_forward(timer, now, period);
hrtimer_start(timer, timer->expires, HRTIMER_MODE_ABS);
}
You could use the return value from hrtimer_forward() to determine how
many events you missed if that is needed. The timer function needs a
similar loop if it wants to use HRTIMER_RESTART.
Single shot timers can handle it like in kernel/hrtimer.c:do_nanosleep()
hrtimer_start(timer, ...);
if (!hrtimer_active(timer))
/* handle the missed expiration */