by Gregory Haskins

[permalink] [raw]

Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing

>>> On Tue, Jan 29, 2008 at 4:02 PM, in message
<[email protected]>, Paul Jackson <[email protected]> wrote:
> Gregory wrote:
>> > ... (1) turning off
>> > sched_load_balance in any overlapping cpusets, including all
>> > encompassing parent cpusets, (2) leaving sched_load_balance on in the
>> > RT cpuset itself, and ...
>>
>> Technically you only need (2). I run my 4-8 core development systems
>> in the single default global cpuset, normally.
>
> Well, if you're running in the default cpuset, then you automatically get
> (1),
> because sched_load_balance is turned off in all overlapping cpusets (there
> aren't any overlapping cpusets!)
>
> So, yes, you -do- need both (1) and (2). In your normal system, you
> just happen to get (1) effortlessly.

Ah. Well see, I am just showing my ignorance of this area of the cpuset code then. I stand corrected, and sorry for the noise. :)

-Greg

2008-01-29 22:24:47

by Steven Rostedt

[permalink] [raw]

Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing

On Tue, 29 Jan 2008, Gregory Haskins wrote:
> >
> > I would like to get our concepts clear, and terms consistent. That's
> > important for those others who would try to understand this.
>
> Very good idea. Thanks for doing this!
>

Sorry for coming in so late, I've been banging my head on different bugs
all day.

Just to clear up what our goal for the RT balancer was, and how simple it
is ;-) Basically any task that has an RT priority needs to run ASAP from
the time it woke up if there's a CPU available that it can run on and it
has a higher priority than what is currently running on that CPU.

If an RT task wakes up, and there's a CPU available somewhere for it to
run on, we want the RT task to jump to that CPU and run. RT tasks should
not be waiting around for nice load balancing that optimizes the cache
usage. But we also have a problem as well. We don't want to kill the
cache on large NUMA architectures by looking for places for on RT task to
run. With domains, we first look for a place that an RT task can run on a
local CPU in the node, if so, then place it there, otherwise look at other
nodes to run on.

Note, that the RT balancing is aggressive and not passive. That means that
the balancing takes place at the time the RT task is awoken (perhaps by
the task that is waking it) or at the time a task changes priority. It is
not passive, being that it waits for something else to migrate it
(i.e. migration thread)

Paul, I think you now understand that we don't have some scheduler domain
that is specific for RT. The scheduling class is specific to the priority
of the process and not to what domain it is in. But if you keep a domain
invisible to an RT task, that domain never needs to worry about RT tasks
migrating to it.

The code that Gregory and I have been adding was to try to migrate RT
tasks to CPUs they can run on as quick as possible without algorithms that
cause cacheline bouncing.

-- Steve