To those interested
I have been working on a CPU resource controller using the nice value as
a control signal. At the moment, the control is done on a
per-task-level, but I have plans to extend it to groups of tasks. The
control is based on a PI-controller (Proportional, Integral), using an
execution time measurement as input to the controller, and the output
from the controller as nice value.
Using the controller, it is possible to make CPU reservations that in a
soft way guarante that tasks achieve as much resources as the
corresponding reference indicates.
For those interested, the concept is described in more detail along with
experiments in the first part of my thesis available at:
http://www.control.lth.se/database/publications/article.pike?artkey=ohlin06lic
p.s.
I changed my last name during this summer, from Andersson to Ohlin, when
I got married. Therefore you may find references to both names in the
thesis and elsewhere.
d.s.
/Martin
On 8/30/06, Martin Ohlin <[email protected]> wrote:
> To those interested
>
> I have been working on a CPU resource controller using the nice value as
> a control signal. At the moment, the control is done on a
> per-task-level, but I have plans to extend it to groups of tasks. The
> control is based on a PI-controller (Proportional, Integral), using an
> execution time measurement as input to the controller, and the output
> from the controller as nice value.
>
The CKRM e-series is a PID based CPU Controller. It did a good job of
controlling and smoothing out the load (and variations) and even
worked with groups. But it achieved all this through some amount of
complexity. How do you plan to extend the idea to groups? Do you have
any code that we can look at?
I do not understand controlling the nice value? Most cpu control the
bandwidth/time - are there any advantages to controlling the nice
value? How does this interplay with dynamic priorities that the
scheduler currently maintains?
> Using the controller, it is possible to make CPU reservations that in a
> soft way guarante that tasks achieve as much resources as the
> corresponding reference indicates.
>
> For those interested, the concept is described in more detail along with
> experiments in the first part of my thesis available at:
> http://www.control.lth.se/database/publications/article.pike?artkey=ohlin06lic
Read one more paper - time
Balbir
Balbir Singh wrote:
> The CKRM e-series is a PID based CPU Controller. It did a good job of
> controlling and smoothing out the load (and variations) and even
> worked with groups. But it achieved all this through some amount of
> complexity. How do you plan to extend the idea to groups? Do you have
> any code that we can look at?
I would say that my controller so far is very simple, probably too
simple. I have no detailed plan yet about how to incorporate groups of
tasks, only small ideas that I would like to think a little more on
before I say something embarrasing. The important code-parts are in the
thesis, and I must say that the code is in no way finished, but most of
it can be found at:
http://www.control.lth.se/user/martin.ohlin/linux/sampler.c
> I do not understand controlling the nice value? Most cpu control the
> bandwidth/time - are there any advantages to controlling the nice
> value? How does this interplay with dynamic priorities that the
> scheduler currently maintains?
There is a relationship between the nice value and the achieved
bandwidth/time. Therefore it was possible that the nice value could be
used to control the bandwidth/time. I wanted to know if it was possible
to use it, and it was. As to the dynamic priorities, I do not change
them, but as I do change the nice value and the dynamic priorities are
relative to that, then you may say that I do change them... Anyway, it
seems to work.
/Martin
Balbir Singh wrote:
> On 8/30/06, Martin Ohlin <[email protected]> wrote:
>> To those interested
>>
>> I have been working on a CPU resource controller using the nice value as
>> a control signal. At the moment, the control is done on a
>> per-task-level, but I have plans to extend it to groups of tasks. The
>> control is based on a PI-controller (Proportional, Integral), using an
>> execution time measurement as input to the controller, and the output
>> from the controller as nice value.
>>
>
> The CKRM e-series is a PID based CPU Controller. It did a good job of
> controlling and smoothing out the load (and variations) and even
> worked with groups. But it achieved all this through some amount of
> complexity. How do you plan to extend the idea to groups? Do you have
> any code that we can look at?
>
> I do not understand controlling the nice value? Most cpu control the
> bandwidth/time - are there any advantages to controlling the nice
> value?
Trying to control CPU allocations purely using time allocations will
only work well for CPU bound processes. Furthermore, the faster CPUs
become the more this will be the case.
> How does this interplay with dynamic priorities that the
> scheduler currently maintains?
But your implication here is valid. It is better to fiddle with the
dynamic priorities than with nice as this leaves nice for its primary
purpose of enabling the sysadmin to effect the allocation of CPU
resources based on external considerations. Having said that I would
also opine that the basic mechanism this author uses to fiddle the nice
values could be applied to the dynamic priorities instead with the key
difference being that nice can be fiddled from outside the scheduler but
you really need to be inside the scheduler to fiddle with dynamic
priorities.
>
>> Using the controller, it is possible to make CPU reservations that in a
>> soft way guarante that tasks achieve as much resources as the
>> corresponding reference indicates.
>>
>> For those interested, the concept is described in more detail along with
>> experiments in the first part of my thesis available at:
>> http://www.control.lth.se/database/publications/article.pike?artkey=ohlin06lic
>>
>
> Read one more paper - time
>
> Balbir
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
> But your implication here is valid. It is better to fiddle with the
> dynamic priorities than with nice as this leaves nice for its primary
> purpose of enabling the sysadmin to effect the allocation of CPU
> resources based on external considerations.
I don't understand. It _is_ the administrator fiddling with nice based
on external considerations. It just steadies the administrator's hand.
-Mike
Mike Galbraith wrote:
> On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
>
>> But your implication here is valid. It is better to fiddle with the
>> dynamic priorities than with nice as this leaves nice for its primary
>> purpose of enabling the sysadmin to effect the allocation of CPU
>> resources based on external considerations.
>
> I don't understand. It _is_ the administrator fiddling with nice based
> on external considerations. It just steadies the administrator's hand.
Not exactly. If "nice" is being (automatically) fiddled to meet some
measurable requirement such as the amount of CPU tasks get it is no
longer available as a means for the indication of the relative
importance of the tasks. I.e. it can't be both the means for saying
which tasks should be allocated the most CPU and the means by which that
allocation is controlled.
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
On Thu, 2006-08-31 at 06:53 +0000, Mike Galbraith wrote:
> On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
>
> > But your implication here is valid. It is better to fiddle with the
> > dynamic priorities than with nice as this leaves nice for its primary
> > purpose of enabling the sysadmin to effect the allocation of CPU
> > resources based on external considerations.
>
> I don't understand. It _is_ the administrator fiddling with nice based
> on external considerations. It just steadies the administrator's hand.
When extended to groups, I see your point. The admin would lose his
ability to apportion bandwidth _within_ the group because he's already
turned his only knob. That is going to be just as much of a problem for
other methods though, and is just a question of how much complexity you
want to pay to achieve fine grained control.
-Mike
On Thu, 2006-08-31 at 15:21 +1000, Peter Williams wrote:
> Mike Galbraith wrote:
> > On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
> >
> >> But your implication here is valid. It is better to fiddle with the
> >> dynamic priorities than with nice as this leaves nice for its primary
> >> purpose of enabling the sysadmin to effect the allocation of CPU
> >> resources based on external considerations.
> >
> > I don't understand. It _is_ the administrator fiddling with nice based
> > on external considerations. It just steadies the administrator's hand.
>
> Not exactly. If "nice" is being (automatically) fiddled to meet some
> measurable requirement such as the amount of CPU tasks get it is no
> longer available as a means for the indication of the relative
> importance of the tasks.
Yeah, I thought about that meanwhile, see other reply.
-Mike
Martin Ohlin wrote:
> http://www.control.lth.se/user/martin.ohlin/linux/sampler.c
>
Thanks for the link.
--
Balbir Singh,
Linux Technology Center,
IBM Software Labs
Peter Williams wrote:
>> I do not understand controlling the nice value? Most cpu control the
>> bandwidth/time - are there any advantages to controlling the nice
>> value?
>
> Trying to control CPU allocations purely using time allocations will
> only work well for CPU bound processes. Furthermore, the faster CPUs
> become the more this will be the case.
The resource we are controlling is CPU bandwidth, what other parameters can we
use to control it?. Nice values indirectly control the time a task gets, but
also affects its priority. Even if a task is not CPU bound, we are only
interested in its CPU bandwidth utilization in the CPU resource controller.
>
>> How does this interplay with dynamic priorities that the
>> scheduler currently maintains?
>
> But your implication here is valid. It is better to fiddle with the
> dynamic priorities than with nice as this leaves nice for its primary
> purpose of enabling the sysadmin to effect the allocation of CPU
> resources based on external considerations. Having said that I would
> also opine that the basic mechanism this author uses to fiddle the nice
> values could be applied to the dynamic priorities instead with the key
> difference being that nice can be fiddled from outside the scheduler but
> you really need to be inside the scheduler to fiddle with dynamic
> priorities.
>
The problem with controlling nice values that I see is that nice values do not
necessarily linearly map CPU time. Changing the nice value also changes the
priority, which impacts the order in which tasks are run.
It's my belief that time and priorities are orthogonal. Nice does a good job
of trying to mix the two, but in the case of resource management it might not
be such a good idea.
--
Balbir Singh,
Linux Technology Center,
IBM Software Labs
On Thu, 2006-08-31 at 11:47 +0530, Balbir Singh wrote:
> It's my belief that time and priorities are orthogonal. Nice does a good job
> of trying to mix the two, but in the case of resource management it might not
> be such a good idea.
I don't think they're orthogonal. If two tasks of identical priority
are contending for cpu, and you choose the one with more time on it's
group ticket, you have effectively modified priorities.
Regardless, nice sounded attractive to me at first, but it's flat wrong
to use a per task variable to store group scope information, so I have
to agree that nice isn't a good choice for group resource management.
-Mike
Balbir Singh wrote:
> Peter Williams wrote:
>
>>> I do not understand controlling the nice value? Most cpu control the
>>> bandwidth/time - are there any advantages to controlling the nice
>>> value?
>>
>> Trying to control CPU allocations purely using time allocations will
>> only work well for CPU bound processes. Furthermore, the faster CPUs
>> become the more this will be the case.
>
> The resource we are controlling is CPU bandwidth,
Unfortunately, most tasks' bursts of CPU are much shorter than the sizes
of the time slices you're allocating (and the faster CPUs get the more
this will be the case) so they don't have much effect.
> what other parameters
> can we
> use to control it?
Dynamic priority.
>. Nice values indirectly control the time a task gets,
> but
> also affects its priority. Even if a task is not CPU bound, we are only
> interested in its CPU bandwidth utilization in the CPU resource controller.
>
>>
>>> How does this interplay with dynamic priorities that the
>>> scheduler currently maintains?
>>
>> But your implication here is valid. It is better to fiddle with the
>> dynamic priorities than with nice as this leaves nice for its primary
>> purpose of enabling the sysadmin to effect the allocation of CPU
>> resources based on external considerations. Having said that I would
>> also opine that the basic mechanism this author uses to fiddle the
>> nice values could be applied to the dynamic priorities instead with
>> the key difference being that nice can be fiddled from outside the
>> scheduler but you really need to be inside the scheduler to fiddle
>> with dynamic priorities.
>>
>
> The problem with controlling nice values that I see is that nice values
> do not
> necessarily linearly map CPU time. Changing the nice value also changes the
> priority, which impacts the order in which tasks are run.
>
> It's my belief that time and priorities are orthogonal. Nice does a good
> job
> of trying to mix the two, but in the case of resource management it
> might not
> be such a good idea.
Think "dynamic priorities".
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
Balbir Singh wrote:
> The CKRM e-series is a PID based CPU Controller. It did a good job of
> controlling and smoothing out the load (and variations) and even
> worked with groups. But it achieved all this through some amount of
> complexity.
I have now downloaded and looked at the code you refer to. But as far as
I can see, the PID controller is only used for load balancing between
CPUs, not for controlling the bandwidth/time of individual tasks. Is
this correct or did I miss something?
/Martin
Mike Galbraith wrote:
> On Thu, 2006-08-31 at 06:53 +0000, Mike Galbraith wrote:
>> On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
>>
>>> But your implication here is valid. It is better to fiddle with the
>>> dynamic priorities than with nice as this leaves nice for its primary
>>> purpose of enabling the sysadmin to effect the allocation of CPU
>>> resources based on external considerations.
>> I don't understand. It _is_ the administrator fiddling with nice based
>> on external considerations. It just steadies the administrator's hand.
>
> When extended to groups, I see your point. The admin would lose his
> ability to apportion bandwidth _within_ the group because he's already
> turned his only knob. That is going to be just as much of a problem for
> other methods though, and is just a question of how much complexity you
> want to pay to achieve fine grained control.
I do not see the problem. Let's say I create a group of three tasks and
give it 50% of the CPU bandwidth (perhaps by using the same nice value
for all the tasks in this group). If I then want to apportion the
bandwidth within the group as you say, then the same thing can be done
by treating them as individual tasks.
Maybe I am wrong, but as I see it, if one wants to control on a group
level, then the individual shares within the group are not that
important. If the individual share is important, then it should be
controlled on a per-task level. Please tell me if I am wrong.
/Martin
Martin Ohlin wrote:
> Balbir Singh wrote:
>
>> The CKRM e-series is a PID based CPU Controller. It did a good job of
>> controlling and smoothing out the load (and variations) and even
>> worked with groups. But it achieved all this through some amount of
>> complexity.
>
> I have now downloaded and looked at the code you refer to. But as far as
> I can see, the PID controller is only used for load balancing between
> CPUs, not for controlling the bandwidth/time of individual tasks. Is
> this correct or did I miss something?
>
> /Martin
Yes, the PID controller is used for load balancing.
--
Balbir Singh,
Linux Technology Center,
IBM Software Labs
On Thu, 2006-08-31 at 12:35 +0200, Martin Ohlin wrote:
> Mike Galbraith wrote:
> > On Thu, 2006-08-31 at 06:53 +0000, Mike Galbraith wrote:
> >> On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
> >>
> >>> But your implication here is valid. It is better to fiddle with the
> >>> dynamic priorities than with nice as this leaves nice for its primary
> >>> purpose of enabling the sysadmin to effect the allocation of CPU
> >>> resources based on external considerations.
> >> I don't understand. It _is_ the administrator fiddling with nice based
> >> on external considerations. It just steadies the administrator's hand.
> >
> > When extended to groups, I see your point. The admin would lose his
> > ability to apportion bandwidth _within_ the group because he's already
> > turned his only knob. That is going to be just as much of a problem for
> > other methods though, and is just a question of how much complexity you
> > want to pay to achieve fine grained control.
>
> I do not see the problem. Let's say I create a group of three tasks and
> give it 50% of the CPU bandwidth (perhaps by using the same nice value
> for all the tasks in this group). If I then want to apportion the
> bandwidth within the group as you say, then the same thing can be done
> by treating them as individual tasks.
Multiplex nice? (oh my, dig foxhole)
> Maybe I am wrong, but as I see it, if one wants to control on a group
> level, then the individual shares within the group are not that
> important. If the individual share is important, then it should be
> controlled on a per-task level. Please tell me if I am wrong.
That's probably right 99% of the time.
-Mike
Martin Ohlin wrote:
> Maybe I am wrong, but as I see it, if one wants to control on a group
> level, then the individual shares within the group are not that
> important. If the individual share is important, then it should be
> controlled on a per-task level. Please tell me if I am wrong.
The individual share within the group may not be important, but the
relative priority might be.
We have instances were we would like to express something like:
--these tasks are all grouped together as "maintenance" tasks, and
should be guaranteed 3% of the system together
--within the maintenance tasks, my network heartbeat application is the
most latency sensitive, so I want it to be higher-priority than the
other maintenance tasks
From my point of view, task group cpu allocation and relative task
priority should be orthogonal.
First you pick a task group (based on cpu share, priority, etc.) then
within the group you pick the task with highest priority.
This was something that CKRM did right (IMHO).
Chris
On Thu, 2006-08-31 at 10:01 -0600, Chris Friesen wrote:
> Martin Ohlin wrote:
>
> > Maybe I am wrong, but as I see it, if one wants to control on a group
> > level, then the individual shares within the group are not that
> > important. If the individual share is important, then it should be
> > controlled on a per-task level. Please tell me if I am wrong.
>
> The individual share within the group may not be important, but the
> relative priority might be.
>
>
> We have instances were we would like to express something like:
>
> --these tasks are all grouped together as "maintenance" tasks, and
> should be guaranteed 3% of the system together
> --within the maintenance tasks, my network heartbeat application is the
> most latency sensitive, so I want it to be higher-priority than the
> other maintenance tasks
The latency issue is hard.
> From my point of view, task group cpu allocation and relative task
> priority should be orthogonal.
>
> First you pick a task group (based on cpu share, priority, etc.) then
> within the group you pick the task with highest priority.
>
> This was something that CKRM did right (IMHO).
I'd really like to see what Kiril's suggestion looks like.
-Mike
>>> On Wed, 30 Aug 2006 17:14:13 +0200, Martin Ohlin
>>> <[email protected]> said:
martin.ohlin> To those interested I have been working on a CPU
martin.ohlin> resource controller using the nice value as a
martin.ohlin> control signal. At the moment, the control is done
martin.ohlin> on a per-task-level, but I have plans to extend it
martin.ohlin> to groups of tasks. [ ... ]
This reminds me of fair share schedulers, which were popular
some decades ago on mainframes and early UNIX systems.
* G. J. Henry "The fair share scheduler AT&T", Bell
Lab.Tech. J. 1845-1857 63, 8 (Oct.).
* Judy Kay, Piers Lauder "A Fair Share Scheduler CACM", 31:1,
44-55 January 1988. <http://WWW.CS.USyd.edu.AU/~piers/>
* %A J. Larmouth
%T Scheduling for a share of the machine
%J SPE
%V 5
%N 1
%D JAN 1975
%P 29-49
%X <URL:http://WWW.CL.Cam.ac.UK/TechReports/UCAM-CL-TR-2.pdf>
* %A J. Larmouth
%T Scheduling for immediate turnround
%J SPE
%V 8
%D 1978
%P 559-578
Martin Ohlin wrote:
> Mike Galbraith wrote:
>> On Thu, 2006-08-31 at 06:53 +0000, Mike Galbraith wrote:
>>> On Thu, 2006-08-31 at 11:07 +1000, Peter Williams wrote:
>>>
>>>> But your implication here is valid. It is better to fiddle with the
>>>> dynamic priorities than with nice as this leaves nice for its
>>>> primary purpose of enabling the sysadmin to effect the allocation of
>>>> CPU resources based on external considerations.
>>> I don't understand. It _is_ the administrator fiddling with nice based
>>> on external considerations. It just steadies the administrator's hand.
>>
>> When extended to groups, I see your point. The admin would lose his
>> ability to apportion bandwidth _within_ the group because he's already
>> turned his only knob. That is going to be just as much of a problem for
>> other methods though, and is just a question of how much complexity you
>> want to pay to achieve fine grained control.
>
> I do not see the problem. Let's say I create a group of three tasks and
> give it 50% of the CPU bandwidth (perhaps by using the same nice value
> for all the tasks in this group). If I then want to apportion the
> bandwidth within the group as you say, then the same thing can be done
> by treating them as individual tasks.
>
> Maybe I am wrong, but as I see it, if one wants to control on a group
> level, then the individual shares within the group are not that
> important. If the individual share is important, then it should be
> controlled on a per-task level. Please tell me if I am wrong.
It's not that the control can't be done using nice. It's that using
nice to do the control stops nice being used for its original purpose.
Some may not see that as a problem BUT some will.
Peter
--
Peter Williams [email protected]
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce