LinuxLists.cc - [RFC PATCH 00/11] sched: CFS low-latency features

2010-08-26 18:15:59

Subject: [RFC PATCH 00/11] sched: CFS low-latency features

Hi,

Following the findings I presented a few months ago
(http://lkml.org/lkml/2010/4/18/13) about CFS having large vruntime spread
issues, Peter Zijlstra and I pursued the discussion and the implementation
effort (my work on this is funded by Nokia). I recently put the result together
and came up with this patchset, combining both his work and mine.

With this patchset, I got the following results with wakeup-latency.c (a 10ms
periodic timer), running periodic-fork.sh, Xorg, make -j3 and firefox (playing a
youtube video), with Xorg moving terminal windows around, in parallel on a UP
system (links to the test program source in the dyn min_vruntime patch). The
Xorg interactivity is very good with the new features enabled, but was poor
originally with the vanilla mainline scheduler. The 10ms timer delays are as
follow:

2.6.35.2 mainline* with low-latency features**
maximum latency: 34465.2 µs 8261.4 µs
average latency: 6445.5 µs 211.2 µs
missed timer events: yes no

* 2.6.35.2 mainline test needs to run periodic-fork.sh for a few minutes first
to let it rip the spread apart.

** low-latency features:

with the patchset applied and CONFIG_SCHED_DEBUG=y
(with debugfs mounted in /sys/debugfs)

for opt in DYN_MIN_VRUNTIME \
NO_FAIR_SLEEPERS FAIR_SLEEPERS_INTERACTIVE FAIR_SLEEPERS_TIMER \
INTERACTIVE TIMER \
INTERACTIVE_FORK_EXPEDITED TIMER_FORK_EXPEDITED;
do echo $opt > /sys/debugfs/sched_features;
done

These patches are designed to allow individual enabling of each feature and to
make sure that the object size of sched.o does not grow when the features are
disabled on a CONFIG_SCHED_DEBUG=n kernel. Optimization of the try_to_wake_up()
fast path when features are enabled could be done later by merging some of these
features together. This patchset is based on 2.6.35.2.

Feedback is welcome,

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-08-26 18:57:37

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [RFC PATCH 00/11] sched: CFS low-latency features

On Thu, 2010-08-26 at 14:09 -0400, Mathieu Desnoyers wrote:
> Feedback is welcome,
>
So we have the following components to this patch:

- dynamic min_vruntime -- push the min_vruntime ahead at the
rate of the runqueue wide virtual clock. This approximates
the virtual clock, esp. when turning off sleeper fairness.
And is cheaper than actually computing the virtual clock.

It allows for better insertion and re-weighting behaviour,
but it does increase overhead somewhat.

- special wakeups using the next-buddy to get scheduled 'soon',
used by all wakeups from the input system and timers.

- special fork semantics related to those special wakeups.

So while I would love to simply compute the virtual clock, it would add
a s64 mult to every enqueue/dequeue and a s64 div to each
enqueue/re-weight, which might be somewhat prohibitive, the dyn
min_vruntime approximation seems to work well enough and costs a u32 div
per enqueue.

Adding a preference to all user generated wakeups (input) and
propagating that state along the wakeup chain seems to make sense,
adding the same to all timers is something that needs to be discussed, I
can well imagine not all timers are equally important -- do we want to
extend the timer interface?

If we do decide we want both, we should at the very least merge the
try_to_wake_up() conditional blob (they're really identical). Preferably
we should reduce ttwu(), not add more to it...

Fudging fork seems dubious at best, it seems generated by the use of
timer_create(.evp->sigev_notify = SIGEV_THREAD), which is a really
broken thing to do, it has very ill defined semantics and is utterly
unable to properly cope with error cases. Furthermore its trivial to
actually correctly implement the desired behaviour, so I'm really
skeptical on this front; friends don't let friends use SIGEV_THREAD.

2010-08-26 21:26:11

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 00/11] sched: CFS low-latency features

On Thu, 26 Aug 2010, Peter Zijlstra wrote:
>
> Fudging fork seems dubious at best, it seems generated by the use of
> timer_create(.evp->sigev_notify = SIGEV_THREAD), which is a really
> broken thing to do, it has very ill defined semantics and is utterly
> unable to properly cope with error cases. Furthermore its trivial to
> actually correctly implement the desired behaviour, so I'm really
> skeptical on this front; friends don't let friends use SIGEV_THREAD.

SIGEV_THREAD is the best proof that the whole posix timer interface
was comitte[e]d under the influence of not to be revealed
mind-altering substances.

I completely object to add timer specific wakeup magic and support for
braindead fork orgies to the kernel proper. All that mess can be fixed
in user space by using sensible functionality.

Providing support for misdesigned crap just for POSIX compliance
reasons and to make some of the blind abusers of that very same crap
happy would be a completely stupid decision.

In fact that would make a brilliant precedence case for forcing the
kernel to solve user space madness at the expense of kernel
complexity. If we follow down that road we get requests for extra
functionality for AIO, networking and whatever in a split second with
no real good reason to reject them anymore.

Thanks,

tglx

2010-08-26 22:23:42

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 00/11] sched: CFS low-latency features

On Thu, 26 Aug 2010, Thomas Gleixner wrote:
> On Thu, 26 Aug 2010, Peter Zijlstra wrote:
> >
> > Fudging fork seems dubious at best, it seems generated by the use of
> > timer_create(.evp->sigev_notify = SIGEV_THREAD), which is a really
> > broken thing to do, it has very ill defined semantics and is utterly
> > unable to properly cope with error cases. Furthermore its trivial to
> > actually correctly implement the desired behaviour, so I'm really
> > skeptical on this front; friends don't let friends use SIGEV_THREAD.
>
> SIGEV_THREAD is the best proof that the whole posix timer interface
> was comitte[e]d under the influence of not to be revealed
> mind-altering substances.
>
> I completely object to add timer specific wakeup magic and support for
> braindead fork orgies to the kernel proper. All that mess can be fixed
> in user space by using sensible functionality.
>
> Providing support for misdesigned crap just for POSIX compliance
> reasons and to make some of the blind abusers of that very same crap
> happy would be a completely stupid decision.
>
> In fact that would make a brilliant precedence case for forcing the
> kernel to solve user space madness at the expense of kernel
> complexity. If we follow down that road we get requests for extra
> functionality for AIO, networking and whatever in a split second with
> no real good reason to reject them anymore.

I really risked eye cancer and digged into the glibc code.

/* There is not much we can do if the allocation fails. */
(void) pthread_create (&th, &tk->attr, timer_sigev_thread, td);

So if the helper thread which gets the signal fails to create the
thread then everything is toast.

What about fixing the f*cked up glibc implementation in the first place
instead of fiddling in the kernel to support this utter madness?

WTF can't the damned delivery thread not be created when timer_create
is called and the signal be delivered to that very thread directly via
SIGEV_THREAD_ID ?

Thanks,

tglx

2010-08-26 23:09:39