by Peter Williams

[permalink] [raw]

Subject: Re: [RFC 0/5] sched: Add CPU rate caps

Sam Vilain wrote:
> Peter Williams wrote:
>
>>> I'd certainly be interested in having a look through the split out patch
>>> to see how namespaces and this advanced scheduling system might
>>> interoperate.
>>>
>>>
>> OK. I've tried very hard to make the scheduling code orthogonal to
>> everything else and it essentially separates out the scheduling within a
>> CPU from other issues e.g. load balancing. This separation is
>> sufficiently good for me to have merged PlugSched with an earlier
>> version of CKRM's CPU management module in a way that made each of
>> PlugSched's schedulers available within CKRM's infrastructure. (CKRM
>> have radically rewritten their CPU code since then and I haven't
>> bothered to keep up.)
>>
>> The key point that I'm trying to make is that I would expect PlugSched
>> and namespaces to coexist without any problems. How it integrates with
>> the "advanced" scheduling system would depend on how that system alters
>> things such as load balancing and/or whether it goes for scheduling
>> outcomes at a higher level than the task.
>>
>>
>
> Coexisting is the base line and I don't think they'll 'interfere' with
> each other, per se, but we specifically want to make it easy for
> userland to set up and configure scheduling policies to apply to groups
> of processes.

They shouldn't interfere as which scheduler to use is a boot time
selection and only one scheduler is in force. It's mainly a coding
matter and in particular whether the "scheduler driver" interface would
need to be modified or whether your scheduler can be implemented using
the current interface.

>
> For instance, the vserver scheduling extension I wrote changes
> scheduling policy so that the interactivity bonus is reduced to -5 ..
> -5, and -5 .. +15 is given as a bonus or penalty for an entire vserver
> that is currently below or above its allocated CPU quotient. In this
> case the scheduling algorithm hasn't changed, just more feedback is
> given into the effective priorities of the processes being scheduled.
> ie, there are now two "inputs" (task and vserver) to the existing scheduler.
>
> I guess the big question is - is there a corresponding concept in
> PlugSched? for instance, is there a reference in the task_struct to the
> current scheduling domain, or is it more CKRM-style with classification
> modules?

It uses the standard run queue structure with per scheduler
modifications (via a union) to handle the different ways that the
schedulers manage priority arrays (so yes). As I said it restricts
itself to scheduling matters within each run queue and leaves the wider
aspects to the normal code.

At first guess, it sounds like adding your scheduler could be as simple
as taking a copy of ingosched.c (which is the implementation of the
standard scheduler within PlugSched) and then making your modifications.
You could probably even share the same run queue components but
there's nothing to stop you adding new ones.

Each scheduler can also have its own per task data via a union in the
task struct.

>
> If there is a reference in the task_struct to some set of scheduling
> counters, maybe we could squint and say that looks like a namespace, and
> throw it into the nsproxy.

Depends on the scheduler.

>
>> I'm assuming that you're happy to wait for the next release? That will
>> improve the likelihood of descriptions in the patches :-).
>>
>>
>
> Let's keep it the technical dialogue going for now, then.

OK. I'm waiting for the next -mm kernel before I make the next release.

>
> Now, forgive me if I'm preaching to the vicar here, but have you tried
> using Stacked Git for the patch development?

No, I actually use the gquilt GUI wrapper for quilt
<http://freshmeat.net/projects/gquilt/> and, although I've modified it
to use a generic interface to the underlying patch management system
(a.k.a. back end), I haven't yet modified it to use Stacked GIT as a
back end. I have thought about it and it was the primary motivation for
adding the generic interface but I ran out of enthusiasm.

> I find the way that you
> build patch descriptions as you go along, can easily wind the commit
> stack to work on individual patches and import other people's work to be
> very simple and powerful.
>
> http://www.procode.org/stgit/
>
> I just mention this because you're not the first person I've talked to
> using Quilt to express some difficulty in producing incremental patchset
> snapshots. Not having used Quilt myself I'm unsure whether this is a
> deficiency or just "the way it is" once a patch set gets big.

Making quilt easier to use is why I wrote gquilt :-)

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

2006-05-30 22:05:36

Kirill Korotaev wrote:
>>> I understand that, I was talking about fairness between capped tasks
>>> and what might be considered fair or intutive between capped tasks and
>>> regular tasks. Of course, the last point is debatable ;)
>>
>>
>> Well, the primary fairness mechanism in the scheduler is the time
>> slice allocation and the capping code doesn't fiddle with those so
>> there should be a reasonable degree of fairness (taking into account
>> "nice") between capped tasks. To improve that would require
>> allocating several new priority slots for use by tasks exceeding their
>> caps and fiddling with those. I don't think that it's worth the bother.

I think more needs to be said about the fairness issue.

1. If a task is getting its cap or more then it's getting its fair share
as specified by that cap. Yes?

2. If a task is getting less CPU usage then its cap then it will be
being scheduled just as if it had no cap and will be getting its fair
share just as much as any task is.

So there is no fairness problem.

> I suppose it should be handled still. a subjective feeling :)
>
> BTW, do you have any test results for your patch?
> It would be interesting to see how precise these limitations are and
> whether or not we should bother for the above...

I tend to test by observing the results of setting caps on running tasks
and this doesn't generate something that can be e-mailed.

Observations indicate that hard caps are enforced to less than 1% and
ditto for soft caps except for small soft caps where the fact (stated in
the patches) that enforcement is not fully strict in order to prevent
priority inversion or starvation means that the cap is generally
exceeded. I'm currently making modifications (based on suggestions by
Con Kolivas) that implement an alternative method for avoiding priority
inversion and starvation and allow the enforcement to be more strict.

Peter
--
Peter Williams [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce