2009-12-11 00:24:01

by Con Kolivas

[permalink] [raw]
Subject: BFS v0.311 CPU scheduler for 2.6.32

This is to briefly announce the availability of the latest stable BFS CPU
scheduler version 0.311 for the new stable linux kernel, 2.6.32.

http://ck.kolivas.org/patches/bfs/2.6.32-sched-bfs-311.patch

Changes since the last announced version, 0.304 are trivial apart from minimal
scalability improvements to make the most of SMT (hyperthreading) and to
improve NUMA performance. Here is the summary from the documentation of the
changes:

When choosing an idle CPU for a waking task, the cache locality is determined
according to where the task last ran and then idle CPUs are ranked from best
to worst to choose the most suitable idle CPU based on cache locality, NUMA
node locality and hyperthread sibling business. They are chosen in the
following preference (if idle):

* Same core, idle or busy cache, idle threads
* Other core, same cache, idle or busy cache, idle threads.
* Same node, other CPU, idle cache, idle threads.
* Same node, other CPU, busy cache, idle threads.
* Same core, busy threads.
* Other core, same cache, busy threads.
* Same node, other CPU, busy threads.
* Other node, other CPU, idle cache, idle threads.
* Other node, other CPU, busy cache, idle threads.
* Other node, other CPU, busy threads.

(The brief rundown for the average user means that if you have a hyperthreaded
CPU, it will use real cores before hyperthread siblings)


A quick summary of the features of BFS:

Excellent interactivity and responsiveness with a very simple, low overhead
design (9000 lines less code than the mainline CPU scheduler)

Suited and scalable for any respectable number of CPUs, whether separate
socket, multicore and/or multithreaded, from 1 to many (although won't scale
well to 4096).

Only one tunable which almost never needs changing.

Features SCHED_IDLEPRIO and SCHED_ISO scheduling policies as well.

To run something idleprio, use schedtool like so:

schedtool -D -e make -j4

To run something isoprio, use schedtool like so:

schedtool -I -e amarok

Features subtick accounting for better CPU usage reporting.


More comprehensive documentation is included in the patch.

--
-ck


2009-12-11 10:29:29

by Mike Galbraith

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Fri, 2009-12-11 at 11:24 +1100, Con Kolivas wrote:

> When choosing an idle CPU for a waking task, the cache locality is determined
> according to where the task last ran and then idle CPUs are ranked from best
> to worst to choose the most suitable idle CPU based on cache locality, NUMA
> node locality and hyperthread sibling business.

The affinity logic still seems to want some loving. Everything I've
tested that is cache sensitive suffers pretty heavily.

FWIW, mysql+oltp is a nice repeatable affinity testcase. tbench is
another. Throughput for both under BFS 311 here are still consistent
with the cost of pairs landing cross-cache on a regular basis. I'm no
fan of vmark, but it's also highly cache sensitive, and is sensitive to
overly enthusiastic wakeup preemption as well, so is useful to keep an
eye on both (otherwise, 800 threads on 4 cores is silly imho). Vmark
does not enjoy the BFS experience (understatement squared).

-Mike

2009-12-11 14:10:46

by Christoph Lameter

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32


Could you make the scheduler build time configurable instead of replacing
the existing one? Embedded folks in particular may love a low footprint
scheduler.

2009-12-11 15:04:43

by Con Kolivas

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> Could you make the scheduler build time configurable instead of replacing
> the existing one? Embedded folks in particular may love a low footprint
> scheduler.
>

It's not a bad idea, but the kernel still needs to be patched either way. To
get BFS they'd need to patch the kernel. If they didn't want BFS, they
wouldn't patch it in the first place.

Thanks,
--
-ck

2009-12-11 15:13:00

by Christoph Lameter

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 12 Dec 2009, Con Kolivas wrote:

> On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> > Could you make the scheduler build time configurable instead of replacing
> > the existing one? Embedded folks in particular may love a low footprint
> > scheduler.
>
> It's not a bad idea, but the kernel still needs to be patched either way. To
> get BFS they'd need to patch the kernel. If they didn't want BFS, they
> wouldn't patch it in the first place.

BFS would have a chance to be merged as an alternate scheduler for
specialized situations (such as embedded or desktop use).


2009-12-11 22:07:08

by Bill Davidsen

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

Con Kolivas wrote:
> On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
>> Could you make the scheduler build time configurable instead of replacing
>> the existing one? Embedded folks in particular may love a low footprint
>> scheduler.
>>
>
> It's not a bad idea, but the kernel still needs to be patched either way. To
> get BFS they'd need to patch the kernel. If they didn't want BFS, they
> wouldn't patch it in the first place.
>
The advantage would be that the same kernel source could then be built into a
custom version with only the config changed.Since building with various configs
is hardly unusual it would save some carrying of multiple sources, for those who
build for several uses already.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2009-12-11 22:37:38

by Con Kolivas

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 12 Dec 2009 02:12:58 Christoph Lameter wrote:
> On Sat, 12 Dec 2009, Con Kolivas wrote:
> > On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> > > Could you make the scheduler build time configurable instead of
> > > replacing the existing one? Embedded folks in particular may love a low
> > > footprint scheduler.
> >
> > It's not a bad idea, but the kernel still needs to be patched either way.
> > To get BFS they'd need to patch the kernel. If they didn't want BFS, they
> > wouldn't patch it in the first place.
>
> BFS would have a chance to be merged as an alternate scheduler for
> specialized situations (such as embedded or desktop use).
>

Nice idea, but regardless of who else might want that, the mainline
maintainers have already made it clear they do not.

Thanks.
--
-ck

Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Friday 11 December 2009 11:37:42 pm Con Kolivas wrote:
> On Sat, 12 Dec 2009 02:12:58 Christoph Lameter wrote:
> > On Sat, 12 Dec 2009, Con Kolivas wrote:
> > > On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> > > > Could you make the scheduler build time configurable instead of
> > > > replacing the existing one? Embedded folks in particular may love a low
> > > > footprint scheduler.
> > >
> > > It's not a bad idea, but the kernel still needs to be patched either way.
> > > To get BFS they'd need to patch the kernel. If they didn't want BFS, they
> > > wouldn't patch it in the first place.
> >
> > BFS would have a chance to be merged as an alternate scheduler for
> > specialized situations (such as embedded or desktop use).
> >
>
> Nice idea, but regardless of who else might want that, the mainline

FWIW I would also love to see it happen.

> maintainers have already made it clear they do not.

Oh, those upstream bastards.. ;)

Why do you care so much about their acknowledgment?

If you are not doing your unpaid kernel work for yourself and for people
who recognize/use it then upstream maintainers not liking your changes
should really be the least of your worries..

--
Bartlomiej Zolnierkiewicz

2009-12-12 02:00:50

by Con Kolivas

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 12 Dec 2009 11:55:39 Bartlomiej Zolnierkiewicz wrote:
> On Friday 11 December 2009 11:37:42 pm Con Kolivas wrote:
> > On Sat, 12 Dec 2009 02:12:58 Christoph Lameter wrote:
> > > On Sat, 12 Dec 2009, Con Kolivas wrote:
> > > > On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> > > > > Could you make the scheduler build time configurable instead of
> > > > > replacing the existing one? Embedded folks in particular may love a
> > > > > low footprint scheduler.
> > > >
> > > > It's not a bad idea, but the kernel still needs to be patched either
> > > > way. To get BFS they'd need to patch the kernel. If they didn't want
> > > > BFS, they wouldn't patch it in the first place.
> > >
> > > BFS would have a chance to be merged as an alternate scheduler for
> > > specialized situations (such as embedded or desktop use).
> >
> > Nice idea, but regardless of who else might want that, the mainline
>
> FWIW I would also love to see it happen.

Thanks!

> > maintainers have already made it clear they do not.
>
> Oh, those upstream bastards.. ;)
>
> Why do you care so much about their acknowledgment?

Whaa...?

>
> If you are not doing your unpaid kernel work for yourself and for people
> who recognize/use it then upstream maintainers not liking your changes
> should really be the least of your worries..
>

Wait, this does not make sense. There's a cyclical flaw in this reasoning. If
I cared about their acknowledgment, I would make it mainline mergeable and
argue a case for it, which I do not want to do.

I'm happy to make reasonable changes to the code consistent with what people
who use it want, but what exactly is the point of making it mainline mergeable
if it will not be merged?

Regards,
--
-ck

Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Saturday 12 December 2009 03:00:54 am Con Kolivas wrote:
> On Sat, 12 Dec 2009 11:55:39 Bartlomiej Zolnierkiewicz wrote:
> > On Friday 11 December 2009 11:37:42 pm Con Kolivas wrote:
> > > On Sat, 12 Dec 2009 02:12:58 Christoph Lameter wrote:
> > > > On Sat, 12 Dec 2009, Con Kolivas wrote:
> > > > > On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> > > > > > Could you make the scheduler build time configurable instead of
> > > > > > replacing the existing one? Embedded folks in particular may love a
> > > > > > low footprint scheduler.
> > > > >
> > > > > It's not a bad idea, but the kernel still needs to be patched either
> > > > > way. To get BFS they'd need to patch the kernel. If they didn't want
> > > > > BFS, they wouldn't patch it in the first place.
> > > >
> > > > BFS would have a chance to be merged as an alternate scheduler for
> > > > specialized situations (such as embedded or desktop use).
> > >
> > > Nice idea, but regardless of who else might want that, the mainline
> >
> > FWIW I would also love to see it happen.
>
> Thanks!
>
> > > maintainers have already made it clear they do not.
> >
> > Oh, those upstream bastards.. ;)
> >
> > Why do you care so much about their acknowledgment?
>
> Whaa...?
>
> >
> > If you are not doing your unpaid kernel work for yourself and for people
> > who recognize/use it then upstream maintainers not liking your changes
> > should really be the least of your worries..
> >
>
> Wait, this does not make sense. There's a cyclical flaw in this reasoning. If
> I cared about their acknowledgment, I would make it mainline mergeable and
> argue a case for it, which I do not want to do.

Unfortunately the flaw is in your reasoning..

> I'm happy to make reasonable changes to the code consistent with what people
> who use it want, but what exactly is the point of making it mainline mergeable
> if it will not be merged?

The thing is that those two points are not necessarily a conflicting ones..

--
Bartlomiej Zolnierkiewicz

2009-12-12 05:55:03

by Willy Tarreau

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

Hi Con,

On Sat, Dec 12, 2009 at 01:00:54PM +1100, Con Kolivas wrote:
> > If you are not doing your unpaid kernel work for yourself and for people
> > who recognize/use it then upstream maintainers not liking your changes
> > should really be the least of your worries..
> >
>
> Wait, this does not make sense. There's a cyclical flaw in this reasoning. If
> I cared about their acknowledgment, I would make it mainline mergeable and
> argue a case for it, which I do not want to do.
>
> I'm happy to make reasonable changes to the code consistent with what people
> who use it want, but what exactly is the point of making it mainline mergeable
> if it will not be merged?

Many people build their own kernels by :
1) applying a lot of patches on them (stable + features)
2) using machine-specific configs

You will get far more testers if they can use the same kernel and
just play with their config files than if they have to patch/unpatch
depending on what they need to have.

I personally would love to be able to add BFS into my kernels for
testing purposes, comparison, and possibly to propose enhancements
and fixes. But I don't want to *replace* mainline code.

Also, I like to have the same kernel sources used on my desktop,
notebook, eeepc, and my bootable USB key. It is a lot easier to
upgrade and a lot easier to spot bugs before they strike in sensible
environments.

Regards,
Willy

2009-12-12 06:10:51

by Con Kolivas

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 12 Dec 2009 16:54:59 Willy Tarreau wrote:
> Hi Con,
>
> On Sat, Dec 12, 2009 at 01:00:54PM +1100, Con Kolivas wrote:
> > > If you are not doing your unpaid kernel work for yourself and for
> > > people who recognize/use it then upstream maintainers not liking your
> > > changes should really be the least of your worries..
> >
> > Wait, this does not make sense. There's a cyclical flaw in this
> > reasoning. If I cared about their acknowledgment, I would make it
> > mainline mergeable and argue a case for it, which I do not want to do.
> >
> > I'm happy to make reasonable changes to the code consistent with what
> > people who use it want, but what exactly is the point of making it
> > mainline mergeable if it will not be merged?
>
> Many people build their own kernels by :
> 1) applying a lot of patches on them (stable + features)
> 2) using machine-specific configs
>
> You will get far more testers if they can use the same kernel and
> just play with their config files than if they have to patch/unpatch
> depending on what they need to have.
>
> I personally would love to be able to add BFS into my kernels for
> testing purposes, comparison, and possibly to propose enhancements
> and fixes. But I don't want to *replace* mainline code.
>
> Also, I like to have the same kernel sources used on my desktop,
> notebook, eeepc, and my bootable USB key. It is a lot easier to
> upgrade and a lot easier to spot bugs before they strike in sensible
> environments.

Thanks Willy.

That's the first meaningful reason I've heard for it. I may have to consider
it now. It still would be compile time limited so probably not quite what
you're hoping for. I don't have the time and energy to do and maintain the
whole plugsched crap for boottime selection all over again, and that adds
overhead which goes against one of my prime objectives.

Regards,
--
-ck

2009-12-12 06:15:17

by Willy Tarreau

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, Dec 12, 2009 at 05:10:44PM +1100, Con Kolivas wrote:
> > Also, I like to have the same kernel sources used on my desktop,
> > notebook, eeepc, and my bootable USB key. It is a lot easier to
> > upgrade and a lot easier to spot bugs before they strike in sensible
> > environments.
>
> Thanks Willy.
>
> That's the first meaningful reason I've heard for it. I may have to consider
> it now. It still would be compile time limited so probably not quite what
> you're hoping for. I don't have the time and energy to do and maintain the
> whole plugsched crap for boottime selection all over again, and that adds
> overhead which goes against one of my prime objectives.

No problem, as I said, I want to use the same *sources*, not to be able
to hot-swap the scheduler. But basically I have a directory named "configs"
in which all of my machines configs are stored. I run "build-kernel-list"
over those configs from the kernel dir and I get all of my new kernels.
You can now easily understand why I don't want to patch in the middle of
the process :-)

Thanks,
Willy

2009-12-12 07:59:37

by Mike Galbraith

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 2009-12-12 at 09:37 +1100, Con Kolivas wrote:
> On Sat, 12 Dec 2009 02:12:58 Christoph Lameter wrote:
> > On Sat, 12 Dec 2009, Con Kolivas wrote:
> > > On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
> > > > Could you make the scheduler build time configurable instead of
> > > > replacing the existing one? Embedded folks in particular may love a low
> > > > footprint scheduler.
> > >
> > > It's not a bad idea, but the kernel still needs to be patched either way.
> > > To get BFS they'd need to patch the kernel. If they didn't want BFS, they
> > > wouldn't patch it in the first place.
> >
> > BFS would have a chance to be merged as an alternate scheduler for
> > specialized situations (such as embedded or desktop use).
> >
>
> Nice idea, but regardless of who else might want that, the mainline
> maintainers have already made it clear they do not.

Hm. You made it very clear from the onset that BFS was not intended to
be a merge candidate. Of course, you're free to change your mind any
time you feel like it.

-Mike

Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Saturday 12 December 2009 07:14:47 am Willy Tarreau wrote:
> On Sat, Dec 12, 2009 at 05:10:44PM +1100, Con Kolivas wrote:
> > > Also, I like to have the same kernel sources used on my desktop,
> > > notebook, eeepc, and my bootable USB key. It is a lot easier to
> > > upgrade and a lot easier to spot bugs before they strike in sensible
> > > environments.
> >
> > Thanks Willy.
> >
> > That's the first meaningful reason I've heard for it. I may have to consider
> > it now. It still would be compile time limited so probably not quite what
> > you're hoping for. I don't have the time and energy to do and maintain the
> > whole plugsched crap for boottime selection all over again, and that adds
> > overhead which goes against one of my prime objectives.
>
> No problem, as I said, I want to use the same *sources*, not to be able
> to hot-swap the scheduler. But basically I have a directory named "configs"
> in which all of my machines configs are stored. I run "build-kernel-list"
> over those configs from the kernel dir and I get all of my new kernels.
> You can now easily understand why I don't want to patch in the middle of
> the process :-)

Similar setup here and similar rationale behind my request. :)

I would like to give BFS a spin on some smaller boxes but then I would need
to remember to pull BFS out of my local patch queue whenever I want to test
something on my main box (I like to keep things rather conservative there)..

--
Bartlomiej Zolnierkiewicz

2009-12-14 14:51:11

by Christoph Lameter

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

On Sat, 12 Dec 2009, Con Kolivas wrote:

> That's the first meaningful reason I've heard for it. I may have to consider
> it now. It still would be compile time limited so probably not quite what
> you're hoping for. I don't have the time and energy to do and maintain the
> whole plugsched crap for boottime selection all over again, and that adds
> overhead which goes against one of my prime objectives.

There was no such suggestion. It would help you a lot if you would be less
uptight about things. If you are near Chicago: Want to get together for
a beer or so?

2009-12-15 00:56:22

by Con Kolivas

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

2009/12/15 Christoph Lameter <[email protected]>:
> On Sat, 12 Dec 2009, Con Kolivas wrote:
>
>> That's the first meaningful reason I've heard for it. I may have to consider
>> it now. It still would be compile time limited so probably not quite what
>> you're hoping for. I don't have the time and energy to do and maintain the
>> whole plugsched crap for boottime selection all over again, and that adds
>> overhead which goes against one of my prime objectives.
>
> There was no such suggestion. It would help you a lot if you would be less
> uptight about things. If you are near Chicago: Want to get together for
> a beer or so?


That's good advice at the best of times, thanks.

Alas, I'm afraid I'm in Australia and would have loved to have taken
you up on that offer of a drink together since that is a mode of
communication I'm much better with.

Regards,
Con

2009-12-18 15:45:01

by Con Kolivas

[permalink] [raw]
Subject: BFS v0.312 configurable CPU scheduler for 2.6.32

As requested, BFS has been made a simple config option to enable/disable in
kernel build. It is otherwise unchanged from .311.

http://ck.kolivas.org/patches/bfs/2.6.32-sched-bfs-312.patch

A quick set of kernbench benchmarks on my quad core:

http://ck.kolivas.org/patches/bfs/kernbench2.6.32BFSvCFS.log

Note one person has had problems with preemptible tree based hierarchical rcu
and bfs 311 using high amounts of CPU, the cause of which remains unknown at
the moment. Probably safest to not use this combination for now.

Be nice,
enjoy!
--
-ck

2009-12-20 04:47:11

by Bill Davidsen

[permalink] [raw]
Subject: Re: BFS v0.311 CPU scheduler for 2.6.32

Con Kolivas wrote:
> On Sat, 12 Dec 2009 02:12:58 Christoph Lameter wrote:
>> On Sat, 12 Dec 2009, Con Kolivas wrote:
>>> On Sat, 12 Dec 2009 01:10:39 Christoph Lameter wrote:
>>>> Could you make the scheduler build time configurable instead of
>>>> replacing the existing one? Embedded folks in particular may love a low
>>>> footprint scheduler.
>>> It's not a bad idea, but the kernel still needs to be patched either way.
>>> To get BFS they'd need to patch the kernel. If they didn't want BFS, they
>>> wouldn't patch it in the first place.
>> BFS would have a chance to be merged as an alternate scheduler for
>> specialized situations (such as embedded or desktop use).
>>
>
> Nice idea, but regardless of who else might want that, the mainline
> maintainers have already made it clear they do not.
>
Since your work is going in as a patch anyway, who is it that cares? The point
is that I have one source which I compile with multiple config files, rather
than multiple sources I get to patch with selected embellishments from -mm and
-next and other places.

It would be great if the system could boot and run on a doorknob scheduler long
enough to load a scheduling modules at boot time. But that's a second level gain
to having a single source and compiling the hell out of it.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot