2009-06-04 05:38:00

by Bharata B Rao

[permalink] [raw]
Subject: [RFC] CPU hard limits

Hi,

This is an RFC about the CPU hard limits feature where I have explained
the need for the feature, the proposed plan and the issues around it.
Before I come up with an implementation for hard limits, I would like to
know community's thoughts on this scheduler enhancement and any feedback
and suggestions.

Regards,
Bharata.

1. CPU hard limit
2. Need for hard limiting CPU resource
3. Granularity of enforcing CPU hard limits
4. Existing solutions
5. Specifying hard limits
6. Per task group vs global bandwidth period
7. Configuring
8. Throttling of tasks
9. Group scheduler hierarchy considerations
10. SMP considerations
11. Starvation
12. Hard limit and fairness

1. CPU hard limit
-----------------
CFS is a proportional share scheduler which tries to divide the CPU time
proportionately between tasks or groups of tasks (task group/cgroup) depending
on the priority/weight of the task or shares assigned to groups of tasks.
In CFS, a task/task group can get more than its share of CPU if there are
enough idle CPU cycles available in the system, due to the work conserving
nature of the scheduler.

However there are scenarios (Sec 2) where giving more than the desired
CPU share to a task/task group is not acceptable. In those scenarios, the
scheduler needs to put a hard stop on the CPU resource consumption of
task/task group if it exceeds a preset limit. This is usually achieved by
throttling the task/task group when it fully consumes its allocated CPU time.

2. Need for hard limiting CPU resource
--------------------------------------
- Pay-per-use: In enterprise systems that cater to multiple clients/customers
where a customer demands a certain share of CPU resources and pays only
that, CPU hard limits will be useful to hard limit the customer's job
to consume only the specified amount of CPU resource.
- In container based virtualization environments running multiple containers,
hard limits will be useful to ensure a container doesn't exceed its
CPU entitlement.
- Hard limits can be used to provide guarantees.

3. Granularity of enforcing CPU hard limits
-------------------------------------------
Conceptually, hard limits can either be enforced for individual tasks or
groups of tasks. However enforcing limits per task would be too fine
grained and would be a lot of work on the part of the system administrator
in terms of setting limits for every task. Based on the current understanding
of the users of this feature, it is felt that hard limiting is more useful
at task group level than the individual tasks level. Hence in the subsequent
paragraphs, the concept of hard limit as applicable to task group/cgroup
is discussed.

4. Existing solutions
---------------------
- Both Linux-VServer and OpenVZ virtualization solutions support CPU hard
limiting.
- Per task limit can be enforced using rlimits, but it is not rate based.

5. Specifying hard limits
-------------------------
CPU time consumed by a task group is generally measured over a
time period (called bandwidth period) and the task group gets throttled
when its CPU time reaches a limit (hard limit) within a bandwidth period.
The task group remains throttled until the bandwidth period gets
renewed at which time additional CPU time becomes available
to the tasks in the system.

When a task group's hard limit is specified as a ratio X/Y, it means that
the group will get throttled if its CPU time consumption exceeds X seconds
in a bandwidth period of Y seconds.

Specifying the hard limit as X/Y requires us to specify the bandwidth
period also.

Is having a uniform/same bandwidth period for all the groups an option ?
If so, we could even specify the hard limit as a percentage, like
30% of a uniform bandwidth period.

6. Per task group vs global bandwidth period
--------------------------------------------
The bandwidth period can either be per task group or global. With global
bandwidth period, the runtimes of all the task groups need to be
replenished when the period ends. Though this appears conceptually simple,
the implementation might not scale. Instead if every task group maintains its
bandwidth period separately, the refresh cycles of each group will happen
independent of each other. Moreover different groups might prefer different
bandwidth periods. Hence the first implementation will have per task group
bandwidth period.

Timers can be used to trigger bandwidth refresh cycles. (similar to rt group
sched)

7. Configuring
--------------
- User could set the hard limit (X and/or Y) through the cgroup fs.
- When the scheduler supports hard limiting, should it be enabled
for all tasks groups of the system ? Or should user have an option
to enable hard limiting per group ?
- When hard limiting is enabled for a group, should the limit be
set to a default to start with ? Or should the user set the limit
and the bandwidth before enabling the hard limiting ?
- What should be a sane default value for the bandwidth period ?

8. Throttling of tasks
----------------------
Task group can be taken off the runqueue when it hits the limit and enqueued
back when the bandwidth period is refreshed. This method would require us to
maintain the throttled tasks list separately for every group.

Under heavy throttling, there could be tasks being dequeued and enqueued
back at bandwidth refresh times leading to frequent variations in the
runqueue load. This might unduly stress the load balancer.

Note: A group (entity) can't be dequeued unless all tasks under it are
dequeued. So there can be false/failed attempts to run tasks of a throttled
group until all the tasks from the throttled group are dequeued.

9. Group scheduler hierarchy considerations
-------------------------------------------
Since the group scheduler is hierarchical in nature, should there be any
relation between the hard limit values of the parent task group
and the values of its child groups ? Should the hard limit values set for
child groups be compatible with the parent's hard limit ? For eg, consider
a group A having hard limit as X/Y has two children A1 and A2. Should the
limits for A1 (X1/Y) and A2 (X2/Y) be set so that X1/Y+X2/Y <= X/Y ?

Or should child groups set their limits independently of parent ? In this
case, even if the child still has CPU time left before it hits the limit,
it could get throttled because its parent got throttled. I would think that
this method will lead to easier implementation.

AFAICS, rt group scheduler needs EDF to support different bandwidth periods
for different groups (Ref: Documentation/scheduler/sched-rt-group.txt). I
don't think the same requirement is applicable to non-rt groups. This is
because with hard limits we are not guaranteeing the CPU time for a group,
instead we are just specifying the max time which a group can run within a
bandwidth period.

10. SMP considerations
----------------------
Hard limits could be enforced for the system as a whole or for individual
CPUS.

When it is enforced per CPU, a task group on a CPU will be throttled if
it reaches its hard limit on that CPU. This can lead to unfairness if
the same task group on other CPUs has runtimes still left and it is not
being utilized.

If enforced system wide, then a task group will be throttled when sum of the
run times of its tasks running on different CPUs reach the limit.

Could we use a hybrid method where a task group that reaches its limit on a CPU
could draw the group bandwidth from another CPU where there are no runnable
tasks belonging to that group ?

RT group scheduling borrows runtime from other CPUs when runtimes are balanced.

11. Starvation
---------------
When a task group that holds a shared resource (like a lock) is throttled,
another group which needs the same shared resource will not be able to
make progress even when the CPU has idle cycles to spare. This will lead
to starvation and unfairness. This situation could be avoided by some of
the methods like

- Disabling throttling when a group is holding a lock.
- Inheriting runtime from the group which faces starvation.

The first implementation will not address this problem of starvation.

12. Hard limits and fairness
----------------------------
Hard limits are set independent of group shares. The hard limit setting
by the user may be such that it may not be possible for the scheduler to
meet fairness and also enforce hard limits. Hard limiting takes precedence.


2009-06-04 12:21:30

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Bharata B Rao wrote:
> 2. Need for hard limiting CPU resource
> --------------------------------------
> - Pay-per-use: In enterprise systems that cater to multiple clients/customers
> where a customer demands a certain share of CPU resources and pays only
> that, CPU hard limits will be useful to hard limit the customer's job
> to consume only the specified amount of CPU resource.
> - In container based virtualization environments running multiple containers,
> hard limits will be useful to ensure a container doesn't exceed its
> CPU entitlement.
> - Hard limits can be used to provide guarantees.
>
How can hard limits provide guarantees?

Let's take an example where I have 1 group that I wish to guarantee a
20% share of the cpu, and anther 8 groups with no limits or guarantees.

One way to achieve the guarantee is to hard limit each of the 8 other
groups to 10%; the sum total of the limits is 80%, leaving 20% for the
guarantee group. The downside is the arbitrary limit imposed on the
other groups.

Another way is to place the 8 groups in a container group, and limit
that to 80%. But that doesn't work if I want to provide guarantees to
several groups.

--
error compiling committee.c: too many arguments to function

2009-06-04 21:32:53

by Mike Waychison

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Avi Kivity wrote:
> Bharata B Rao wrote:
>> 2. Need for hard limiting CPU resource
>> --------------------------------------
>> - Pay-per-use: In enterprise systems that cater to multiple clients/customers
>> where a customer demands a certain share of CPU resources and pays only
>> that, CPU hard limits will be useful to hard limit the customer's job
>> to consume only the specified amount of CPU resource.
>> - In container based virtualization environments running multiple containers,
>> hard limits will be useful to ensure a container doesn't exceed its
>> CPU entitlement.
>> - Hard limits can be used to provide guarantees.
>>
> How can hard limits provide guarantees?

Hard limits are useful and desirable in situations where we would like
to maintain deterministic behavior.

Placing a hard cap on the cpu usage of a given task group (and
configuring such that this cpu time is not overcommited) on a system
allows us to create a hard guarantee that throughput for that task group
will not fluctuate as other workloads are added and removed on the system.

Cache use and bus bandwidth in a multi-workload environment can still
cause a performance deviation, but these are second order compared to
the cpu scheduling guarantees themselves.

Mike Waychison

2009-06-05 03:04:18

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Thu, Jun 04, 2009 at 03:19:22PM +0300, Avi Kivity wrote:
> Bharata B Rao wrote:
>> 2. Need for hard limiting CPU resource
>> --------------------------------------
>> - Pay-per-use: In enterprise systems that cater to multiple clients/customers
>> where a customer demands a certain share of CPU resources and pays only
>> that, CPU hard limits will be useful to hard limit the customer's job
>> to consume only the specified amount of CPU resource.
>> - In container based virtualization environments running multiple containers,
>> hard limits will be useful to ensure a container doesn't exceed its
>> CPU entitlement.
>> - Hard limits can be used to provide guarantees.
>>
> How can hard limits provide guarantees?
>
> Let's take an example where I have 1 group that I wish to guarantee a
> 20% share of the cpu, and anther 8 groups with no limits or guarantees.
>
> One way to achieve the guarantee is to hard limit each of the 8 other
> groups to 10%; the sum total of the limits is 80%, leaving 20% for the
> guarantee group. The downside is the arbitrary limit imposed on the
> other groups.

This method sounds very similar to the openvz method:
http://wiki.openvz.org/Containers/Guarantees_for_resources

>
> Another way is to place the 8 groups in a container group, and limit
> that to 80%. But that doesn't work if I want to provide guarantees to
> several groups.

Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?

Regards,
Bharata.

2009-06-05 03:07:43

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Avi Kivity <[email protected]> [2009-06-04 15:19:22]:

> Bharata B Rao wrote:
> > 2. Need for hard limiting CPU resource
> > --------------------------------------
> > - Pay-per-use: In enterprise systems that cater to multiple clients/customers
> > where a customer demands a certain share of CPU resources and pays only
> > that, CPU hard limits will be useful to hard limit the customer's job
> > to consume only the specified amount of CPU resource.
> > - In container based virtualization environments running multiple containers,
> > hard limits will be useful to ensure a container doesn't exceed its
> > CPU entitlement.
> > - Hard limits can be used to provide guarantees.
> >
> How can hard limits provide guarantees?
>
> Let's take an example where I have 1 group that I wish to guarantee a
> 20% share of the cpu, and anther 8 groups with no limits or guarantees.
>
> One way to achieve the guarantee is to hard limit each of the 8 other
> groups to 10%; the sum total of the limits is 80%, leaving 20% for the
> guarantee group. The downside is the arbitrary limit imposed on the
> other groups.
>
> Another way is to place the 8 groups in a container group, and limit
> that to 80%. But that doesn't work if I want to provide guarantees to
> several groups.
>

Hi, Avi,

Take a look at
http://wiki.openvz.org/Containers/Guarantees_for_resources
and the associated program in the wiki page.

--
Balbir

2009-06-05 03:35:45

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Bharata B Rao wrote:
>> Another way is to place the 8 groups in a container group, and limit
>> that to 80%. But that doesn't work if I want to provide guarantees to
>> several groups.
>>
>
> Hmm why not ? Reduce the guarantee of the container group and provide
> the same to additional groups ?
>

This method produces suboptimal results:

$ cgroup-limits 10 10 0
[50.0, 50.0, 40.0]

I want to provide two 10% guaranteed groups and one best-effort group.
Using the limits method, no group can now use more than 50% of the
resources. However, having the first group use 90% of the resources
does not violate any guarantees, but it not allowed by the solution.

#!/usr/bin/python

def calculate_limits(g, R):
N = len(g)
if N == 1:
return [R]

s = sum([R - gi for gi in g])
return [(s - (R - gi) - (N - 2) * (R - gi)) / (N - 1)
for gi in g]

import sys
print calculate_limits([float(x) for x in sys.argv[1:]], 100)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 04:37:29

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity <[email protected]> wrote:
> Bharata B Rao wrote:
>>>
>>> Another way is to place the 8 groups in a container group, and limit
>>> ?that to 80%. But that doesn't work if I want to provide guarantees to
>>> ?several groups.
>>>
>>
>> Hmm why not ? Reduce the guarantee of the container group and provide
>> the same to additional groups ?
>>
>
> This method produces suboptimal results:
>
> $ cgroup-limits 10 10 0
> [50.0, 50.0, 40.0]
>
> I want to provide two 10% guaranteed groups and one best-effort group.
> ?Using the limits method, no group can now use more than 50% of the
> resources. ?However, having the first group use 90% of the resources does
> not violate any guarantees, but it not allowed by the solution.
>

How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%

Now if we really have zeros, I would recommend using

cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.

Adding zeros to the calcuation is not recommended. Does that help?

Balbir

2009-06-05 04:46:23

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:
> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity <[email protected]> wrote:
>
>> Bharata B Rao wrote:
>>
>>>> Another way is to place the 8 groups in a container group, and limit
>>>> that to 80%. But that doesn't work if I want to provide guarantees to
>>>> several groups.
>>>>
>>>>
>>> Hmm why not ? Reduce the guarantee of the container group and provide
>>> the same to additional groups ?
>>>
>>>
>> This method produces suboptimal results:
>>
>> $ cgroup-limits 10 10 0
>> [50.0, 50.0, 40.0]
>>
>> I want to provide two 10% guaranteed groups and one best-effort group.
>> Using the limits method, no group can now use more than 50% of the
>> resources. However, having the first group use 90% of the resources does
>> not violate any guarantees, but it not allowed by the solution.
>>
>>
>
> How, it works out fine in my calculation
>
> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
> limited to 90%
> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
> limited to 90%
> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
> limited to 100%
>

It's fine in that it satisfies the guarantees, but it is deeply
suboptimal. If I ran a cpu hog in the first group, while the other two
were idle, it would be limited to 50% cpu. On the other hand, if it
consumed all 100% cpu it would still satisfy the guarantees (as the
other groups are idle).

The result is that in such a situation, wall clock time would double
even though cpu resources are available.
> Now if we really have zeros, I would recommend using
>
> cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.
>
> Adding zeros to the calcuation is not recommended. Does that help?

What do you mean, it is not recommended? I have two groups which need at
least 10% and one which does not need any guarantee, how do I express it?

In any case, changing the zero to 1% does not materially change the results.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 04:50:06

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Avi Kivity <[email protected]> [2009-06-05 07:44:27]:

> Balbir Singh wrote:
>> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity <[email protected]> wrote:
>>
>>> Bharata B Rao wrote:
>>>
>>>>> Another way is to place the 8 groups in a container group, and limit
>>>>> that to 80%. But that doesn't work if I want to provide guarantees to
>>>>> several groups.
>>>>>
>>>>>
>>>> Hmm why not ? Reduce the guarantee of the container group and provide
>>>> the same to additional groups ?
>>>>
>>>>
>>> This method produces suboptimal results:
>>>
>>> $ cgroup-limits 10 10 0
>>> [50.0, 50.0, 40.0]
>>>
>>> I want to provide two 10% guaranteed groups and one best-effort group.
>>> Using the limits method, no group can now use more than 50% of the
>>> resources. However, having the first group use 90% of the resources does
>>> not violate any guarantees, but it not allowed by the solution.
>>>
>>>
>>
>> How, it works out fine in my calculation
>>
>> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
>> limited to 90%
>> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
>> limited to 90%
>> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
>> limited to 100%
>>
>
> It's fine in that it satisfies the guarantees, but it is deeply
> suboptimal. If I ran a cpu hog in the first group, while the other two
> were idle, it would be limited to 50% cpu. On the other hand, if it
> consumed all 100% cpu it would still satisfy the guarantees (as the
> other groups are idle).
>
> The result is that in such a situation, wall clock time would double
> even though cpu resources are available.

But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?


>> Now if we really have zeros, I would recommend using
>>
>> cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.
>>
>> Adding zeros to the calcuation is not recommended. Does that help?
>
> What do you mean, it is not recommended? I have two groups which need at
> least 10% and one which does not need any guarantee, how do I express it?
>
Ignore this part of my comment

> In any case, changing the zero to 1% does not materially change the results.

True.

--
Balbir

2009-06-05 05:09:43

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:

> But then there is no other way to make a *guarantee*, guarantees come
> at a cost of idling resources, no? Can you show me any other
> combination that will provide the guarantee and without idling the
> system for the specified guarantees?

The example given was two 10% guaranteed groups and one best-effort
group. Why would this require idling resources?

If I have a hog in each group, the requirements would be met if the
groups got 33, 33, and 33. (Or 10/10/80, for that matter.) If the
second and third groups go idle, why not let the first group use 100% of
the cpu?

The only hard restriction is that the sum of the guarantees must be less
than 100%.

Chris

2009-06-05 05:11:10

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Balbir Singh <[email protected]> [2009-06-05 12:49:46]:

> * Avi Kivity <[email protected]> [2009-06-05 07:44:27]:
>
> > Balbir Singh wrote:
> >> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity <[email protected]> wrote:
> >>
> >>> Bharata B Rao wrote:
> >>>
> >>>>> Another way is to place the 8 groups in a container group, and limit
> >>>>> that to 80%. But that doesn't work if I want to provide guarantees to
> >>>>> several groups.
> >>>>>
> >>>>>
> >>>> Hmm why not ? Reduce the guarantee of the container group and provide
> >>>> the same to additional groups ?
> >>>>
> >>>>
> >>> This method produces suboptimal results:
> >>>
> >>> $ cgroup-limits 10 10 0
> >>> [50.0, 50.0, 40.0]
> >>>
> >>> I want to provide two 10% guaranteed groups and one best-effort group.
> >>> Using the limits method, no group can now use more than 50% of the
> >>> resources. However, having the first group use 90% of the resources does
> >>> not violate any guarantees, but it not allowed by the solution.
> >>>
> >>>
> >>
> >> How, it works out fine in my calculation
> >>
> >> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
> >> limited to 90%
> >> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
> >> limited to 90%
> >> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
> >> limited to 100%
> >>
> >
> > It's fine in that it satisfies the guarantees, but it is deeply
> > suboptimal. If I ran a cpu hog in the first group, while the other two
> > were idle, it would be limited to 50% cpu. On the other hand, if it
> > consumed all 100% cpu it would still satisfy the guarantees (as the
> > other groups are idle).
> >
> > The result is that in such a situation, wall clock time would double
> > even though cpu resources are available.
>
> But then there is no other way to make a *guarantee*, guarantees come
> at a cost of idling resources, no? Can you show me any other
> combination that will provide the guarantee and without idling the
> system for the specified guarantees?

OK, I see part of your concern, but I think we could do some
optimizations during design. For example if all groups have reached
their hard-limit and the system is idle, should we do start a new hard
limit interval and restart, so that idleness can be removed. Would
that be an acceptable design point?

--
Balbir

2009-06-05 05:13:33

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Chris Friesen <[email protected]> [2009-06-04 23:09:22]:

> Balbir Singh wrote:
>
> > But then there is no other way to make a *guarantee*, guarantees come
> > at a cost of idling resources, no? Can you show me any other
> > combination that will provide the guarantee and without idling the
> > system for the specified guarantees?
>
> The example given was two 10% guaranteed groups and one best-effort
> group. Why would this require idling resources?
>
> If I have a hog in each group, the requirements would be met if the
> groups got 33, 33, and 33. (Or 10/10/80, for that matter.) If the
> second and third groups go idle, why not let the first group use 100% of
> the cpu?
>
> The only hard restriction is that the sum of the guarantees must be less
> than 100%.
>

Chris,

I just responded to a variation of this, I think that some of this
could be handled during design. I just sent out the email a few
minutes ago. Could you look at that and respond.

--
Balbir

2009-06-05 05:18:19

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:



>>> How, it works out fine in my calculation
>>>
>>> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
>>> limited to 90%
>>> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
>>> limited to 90%
>>> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
>>> limited to 100%
>>>
>>>
>> It's fine in that it satisfies the guarantees, but it is deeply
>> suboptimal. If I ran a cpu hog in the first group, while the other two
>> were idle, it would be limited to 50% cpu. On the other hand, if it
>> consumed all 100% cpu it would still satisfy the guarantees (as the
>> other groups are idle).
>>
>> The result is that in such a situation, wall clock time would double
>> even though cpu resources are available.
>>
>
> But then there is no other way to make a *guarantee*, guarantees come
> at a cost of idling resources, no? Can you show me any other
> combination that will provide the guarantee and without idling the
> system for the specified guarantees?
>

Suppose in my example cgroup 1 consumed 100% of the cpu resources and
cgroup 2 and 3 were completely idle. All of the guarantees are met (if
cgroup 2 is idle, there's no need to give it the 10% cpu time it is
guaranteed).

If your only tool to achieve the guarantees is a limit system, then
yes, the equation yields the correct results. But given that it yields
such inferior results, I think we need to look for a more involved solution.

I think the limits method fits cases where it is difficult to evict a
resource (say, disk quotas -- if you want to guarantee 10% of space to
cgroups 1, you must limit all others to 90%). But for processor usage,
you can evict a cgroup instantly, so nothing prevents a cgroup from
consuming all available resources as long as others do not contend for them.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 05:20:32

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Avi Kivity <[email protected]> [2009-06-05 08:16:21]:

> Balbir Singh wrote:
>
>
>
>>>> How, it works out fine in my calculation
>>>>
>>>> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
>>>> limited to 90%
>>>> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
>>>> limited to 90%
>>>> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
>>>> limited to 100%
>>>>
>>> It's fine in that it satisfies the guarantees, but it is deeply
>>> suboptimal. If I ran a cpu hog in the first group, while the other
>>> two were idle, it would be limited to 50% cpu. On the other hand,
>>> if it consumed all 100% cpu it would still satisfy the guarantees
>>> (as the other groups are idle).
>>>
>>> The result is that in such a situation, wall clock time would double
>>> even though cpu resources are available.
>>>
>>
>> But then there is no other way to make a *guarantee*, guarantees come
>> at a cost of idling resources, no? Can you show me any other
>> combination that will provide the guarantee and without idling the
>> system for the specified guarantees?
>>
>
> Suppose in my example cgroup 1 consumed 100% of the cpu resources and
> cgroup 2 and 3 were completely idle. All of the guarantees are met (if
> cgroup 2 is idle, there's no need to give it the 10% cpu time it is
> guaranteed).
>
> If your only tool to achieve the guarantees is a limit system, then
> yes, the equation yields the correct results. But given that it yields
> such inferior results, I think we need to look for a more involved
> solution.
>
> I think the limits method fits cases where it is difficult to evict a
> resource (say, disk quotas -- if you want to guarantee 10% of space to
> cgroups 1, you must limit all others to 90%). But for processor usage,
> you can evict a cgroup instantly, so nothing prevents a cgroup from
> consuming all available resources as long as others do not contend for
> them.

Avi,

Could you look at my newer email and comment, where I've mentioned
that I see your concern and discussed a design point. We could
probably take this discussion forward from there?

--
Balbir

2009-06-05 05:23:28

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:
>> But then there is no other way to make a *guarantee*, guarantees come
>> at a cost of idling resources, no? Can you show me any other
>> combination that will provide the guarantee and without idling the
>> system for the specified guarantees?
>>
>
> OK, I see part of your concern, but I think we could do some
> optimizations during design. For example if all groups have reached
> their hard-limit and the system is idle, should we do start a new hard
> limit interval and restart, so that idleness can be removed. Would
> that be an acceptable design point?

I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a
cpu hog running in each group, how would the algorithm divide resources?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 05:28:26

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Avi Kivity <[email protected]> [2009-06-05 08:21:43]:

> Balbir Singh wrote:
>>> But then there is no other way to make a *guarantee*, guarantees come
>>> at a cost of idling resources, no? Can you show me any other
>>> combination that will provide the guarantee and without idling the
>>> system for the specified guarantees?
>>>
>>
>> OK, I see part of your concern, but I think we could do some
>> optimizations during design. For example if all groups have reached
>> their hard-limit and the system is idle, should we do start a new hard
>> limit interval and restart, so that idleness can be removed. Would
>> that be an acceptable design point?
>
> I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a
> cpu hog running in each group, how would the algorithm divide resources?
>

As per the matrix calculation, but as soon as we reach an idle point,
we redistribute the b/w and start a new quantum so to speak, where all
groups are charged up to their hard limits.

For your question, if there is a CPU hog running, it would be as per
the matrix calculation, since the system has no idle point during the
bandwidth period.

--
Balbir

2009-06-05 05:33:08

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
> * Avi Kivity <[email protected]> [2009-06-05 08:21:43]:
>
> > Balbir Singh wrote:
> >>> But then there is no other way to make a *guarantee*, guarantees come
> >>> at a cost of idling resources, no? Can you show me any other
> >>> combination that will provide the guarantee and without idling the
> >>> system for the specified guarantees?
> >>>
> >>
> >> OK, I see part of your concern, but I think we could do some
> >> optimizations during design. For example if all groups have reached
> >> their hard-limit and the system is idle, should we do start a new hard
> >> limit interval and restart, so that idleness can be removed. Would
> >> that be an acceptable design point?
> >
> > I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a
> > cpu hog running in each group, how would the algorithm divide resources?
> >
>
> As per the matrix calculation, but as soon as we reach an idle point,
> we redistribute the b/w and start a new quantum so to speak, where all
> groups are charged up to their hard limits.

But could there be client models where you are required to strictly
adhere to the limit within the bandwidth and not provide more (by advancing
the bandwidth period) in the presence of idle cycles ?

Regards,
Bharata.

2009-06-05 06:03:48

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Bharata B Rao wrote:
> On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
>
>> * Avi Kivity <[email protected]> [2009-06-05 08:21:43]:
>>
>>
>>> Balbir Singh wrote:
>>>
>>>>> But then there is no other way to make a *guarantee*, guarantees come
>>>>> at a cost of idling resources, no? Can you show me any other
>>>>> combination that will provide the guarantee and without idling the
>>>>> system for the specified guarantees?
>>>>>
>>>>>
>>>> OK, I see part of your concern, but I think we could do some
>>>> optimizations during design. For example if all groups have reached
>>>> their hard-limit and the system is idle, should we do start a new hard
>>>> limit interval and restart, so that idleness can be removed. Would
>>>> that be an acceptable design point?
>>>>
>>> I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a
>>> cpu hog running in each group, how would the algorithm divide resources?
>>>
>>>
>> As per the matrix calculation, but as soon as we reach an idle point,
>> we redistribute the b/w and start a new quantum so to speak, where all
>> groups are charged up to their hard limits.
>>
>
> But could there be client models where you are required to strictly
> adhere to the limit within the bandwidth and not provide more (by advancing
> the bandwidth period) in the presence of idle cycles ?
>

That's the limit part. I'd like to be able to specify limits and
guarantees on the same host and for the same groups; I don't think that
works when you advance the bandwidth period.

I think we need to treat guarantees as first-class goals, not something
derived from limits (in fact I think guarantees are more useful as they
can be used to provide SLAs).

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 06:05:52

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:
>> I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a
>> cpu hog running in each group, how would the algorithm divide resources?
>>
>>
>
> As per the matrix calculation, but as soon as we reach an idle point,
> we redistribute the b/w and start a new quantum so to speak, where all
> groups are charged up to their hard limits.
>
> For your question, if there is a CPU hog running, it would be as per
> the matrix calculation, since the system has no idle point during the
> bandwidth period.
>

So the groups with guarantees get a priority boost. That's not a good
side effect.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 06:33:44

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 09:03:37AM +0300, Avi Kivity wrote:
> Balbir Singh wrote:
>>> I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1),
>>> and a cpu hog running in each group, how would the algorithm divide
>>> resources?
>>>
>>>
>>
>> As per the matrix calculation, but as soon as we reach an idle point,
>> we redistribute the b/w and start a new quantum so to speak, where all
>> groups are charged up to their hard limits.
>>
>> For your question, if there is a CPU hog running, it would be as per
>> the matrix calculation, since the system has no idle point during the
>> bandwidth period.
>>
>
> So the groups with guarantees get a priority boost. That's not a good
> side effect.

That happens only in the presence of idle cycles when other groups [with or
without guarantees] have nothing useful to do. So how would that matter
since there is nothing else to run anyway ?

Regards,
Bharata.

2009-06-05 08:28:03

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 09:01:50AM +0300, Avi Kivity wrote:
> Bharata B Rao wrote:
>>
>> But could there be client models where you are required to strictly
>> adhere to the limit within the bandwidth and not provide more (by advancing
>> the bandwidth period) in the presence of idle cycles ?
>>
>
> That's the limit part. I'd like to be able to specify limits and
> guarantees on the same host and for the same groups; I don't think that
> works when you advance the bandwidth period.
>
> I think we need to treat guarantees as first-class goals, not something
> derived from limits (in fact I think guarantees are more useful as they
> can be used to provide SLAs).

I agree that guarantees are important, but I am not sure about

1. specifying both limits and guarantees for groups and
2. not deriving guarantees from limits.

Guarantees are met by some form of throttling or limiting and hence I think
limiting should drive the guarantees.

Regards,
Bharata.

2009-06-05 08:53:31

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Wed, Jun 3, 2009 at 10:36 PM, Bharata B
Rao<[email protected]> wrote:
> - Hard limits can be used to provide guarantees.
>

This claim (and the subsequent long thread it generated on how limits
can provide guarantees) confused me a bit.

Why do we need limits to provide guarantees when we can already
provide guarantees via shares?

Suppose 10 cgroups each want 10% of the machine's CPU. We can just
give each cgroup an equal share, and they're guaranteed 10% if they
try to use it; if they don't use it, other cgroups can get access to
the idle cycles.

Suppose cgroup A wants a guarantee of 50% and two others, B and C,
want guarantees of 15% each; give A 50 shares and B and C 15 shares
each. In this case, if they all run flat out they'll get 62%/19%/19%,
which is within their SLA.

That's not to say that hard limits can't be useful in their own right
- e.g. for providing reproducible loadtesting conditions by
controlling how much CPU a service can use during the load test. But I
don't see why using them to implement guarantees is either necessary
or desirable.

(Unless I'm missing some crucial point ...)

Paul

2009-06-05 09:02:47

by Reinhard Tartler

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Bharata B Rao <[email protected]> writes:

> 4. Existing solutions
> ---------------------
> - Both Linux-VServer and OpenVZ virtualization solutions support CPU hard
> limiting.
> - Per task limit can be enforced using rlimits, but it is not rate
> based.

Also related work:

http://www.usenix.org/events/osdi99/full_papers/banga/banga.pdf

it has even had a preliminary linux implementation from 2003, which has
been proposed at that time, but wasn't considered.

http://admingilde.org/~martin/rc/

maybe someone wants to pick up that work?


--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

2009-06-05 09:28:41

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 01:53:15AM -0700, Paul Menage wrote:
> On Wed, Jun 3, 2009 at 10:36 PM, Bharata B
> Rao<[email protected]> wrote:
> > - Hard limits can be used to provide guarantees.
> >
>
> This claim (and the subsequent long thread it generated on how limits
> can provide guarantees) confused me a bit.
>
> Why do we need limits to provide guarantees when we can already
> provide guarantees via shares?

shares design is proportional and hence it can't by itself provide
guarantees.

>
> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
> give each cgroup an equal share, and they're guaranteed 10% if they
> try to use it; if they don't use it, other cgroups can get access to
> the idle cycles.

Now if 11th group with same shares comes in, then each group will now
get 9% of CPU and that 10% guarantee breaks.

Regards,
Bharata.

2009-06-05 09:33:16

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 2:27 AM, Bharata B Rao<[email protected]> wrote:
>>
>> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
>> give each cgroup an equal share, and they're guaranteed 10% if they
>> try to use it; if they don't use it, other cgroups can get access to
>> the idle cycles.
>
> Now if 11th group with same shares comes in, then each group will now
> get 9% of CPU and that 10% guarantee breaks.

So you're trying to guarantee 11 cgroups that they can each get 10% of
the CPU? That's called over-committing, and while there's nothing
wrong with doing that if you're confident that they'll not al need
their 10% at the same time, there's no way to *guarantee* them all
10%. You can guarantee them all 9% and hope the extra 1% is spare for
those that need it (over-committing), or you can guarantee 10 of them
10% and give the last one 0 shares.

How would you propose to guarantee 11 cgroups each 10% of the CPU
using hard limits?

Paul

2009-06-05 09:41:06

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Bharata B Rao <[email protected]> [2009-06-05 11:01:59]:

> On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
> > * Avi Kivity <[email protected]> [2009-06-05 08:21:43]:
> >
> > > Balbir Singh wrote:
> > >>> But then there is no other way to make a *guarantee*, guarantees come
> > >>> at a cost of idling resources, no? Can you show me any other
> > >>> combination that will provide the guarantee and without idling the
> > >>> system for the specified guarantees?
> > >>>
> > >>
> > >> OK, I see part of your concern, but I think we could do some
> > >> optimizations during design. For example if all groups have reached
> > >> their hard-limit and the system is idle, should we do start a new hard
> > >> limit interval and restart, so that idleness can be removed. Would
> > >> that be an acceptable design point?
> > >
> > > I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a
> > > cpu hog running in each group, how would the algorithm divide resources?
> > >
> >
> > As per the matrix calculation, but as soon as we reach an idle point,
> > we redistribute the b/w and start a new quantum so to speak, where all
> > groups are charged up to their hard limits.
>
> But could there be client models where you are required to strictly
> adhere to the limit within the bandwidth and not provide more (by advancing
> the bandwidth period) in the presence of idle cycles ?
>

Good point, I think so, so I think there is should be a good default
and configurable for the other case.

--
Balbir

2009-06-05 09:41:24

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* Avi Kivity <[email protected]> [2009-06-05 09:01:50]:

> Bharata B Rao wrote:
>> On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
>>
>>> * Avi Kivity <[email protected]> [2009-06-05 08:21:43]:
>>>
>>>
>>>> Balbir Singh wrote:
>>>>
>>>>>> But then there is no other way to make a *guarantee*, guarantees come
>>>>>> at a cost of idling resources, no? Can you show me any other
>>>>>> combination that will provide the guarantee and without idling the
>>>>>> system for the specified guarantees?
>>>>>>
>>>>> OK, I see part of your concern, but I think we could do some
>>>>> optimizations during design. For example if all groups have reached
>>>>> their hard-limit and the system is idle, should we do start a new hard
>>>>> limit interval and restart, so that idleness can be removed. Would
>>>>> that be an acceptable design point?
>>>>>
>>>> I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1),
>>>> and a cpu hog running in each group, how would the algorithm
>>>> divide resources?
>>>>
>>>>
>>> As per the matrix calculation, but as soon as we reach an idle point,
>>> we redistribute the b/w and start a new quantum so to speak, where all
>>> groups are charged up to their hard limits.
>>>
>>
>> But could there be client models where you are required to strictly
>> adhere to the limit within the bandwidth and not provide more (by advancing
>> the bandwidth period) in the presence of idle cycles ?
>>
>
> That's the limit part. I'd like to be able to specify limits and
> guarantees on the same host and for the same groups; I don't think that
> works when you advance the bandwidth period.

Yes, this feature needs to be configurable. But your use case for both
limits and guarantees is interesting. We spoke to Peter and he was
convinced only of the guarantee use case. Could you please help
elaborate your use case, so that we can incorporate it into RFC v2 we
send out. Peter is opposed to having hard limits and is convinced that
they are not generally useful, so far I seen you and Paul say it is
useful, any arguments you have or any +1 from you will help us. Peter
I am not back stabbing you :)


>
> I think we need to treat guarantees as first-class goals, not something
> derived from limits (in fact I think guarantees are more useful as they
> can be used to provide SLAs).

Even limits are useful for SLA's since your b/w available changes
quite drastically as we add or remove groups. There are other use
cases for limits as well.


--
Balbir

2009-06-05 09:41:34

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* [email protected] <[email protected]> [2009-06-05 01:53:15]:

> On Wed, Jun 3, 2009 at 10:36 PM, Bharata B
> Rao<[email protected]> wrote:
> > - Hard limits can be used to provide guarantees.
> >
>
> This claim (and the subsequent long thread it generated on how limits
> can provide guarantees) confused me a bit.
>
> Why do we need limits to provide guarantees when we can already
> provide guarantees via shares?
>
> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
> give each cgroup an equal share, and they're guaranteed 10% if they
> try to use it; if they don't use it, other cgroups can get access to
> the idle cycles.
>
> Suppose cgroup A wants a guarantee of 50% and two others, B and C,
> want guarantees of 15% each; give A 50 shares and B and C 15 shares
> each. In this case, if they all run flat out they'll get 62%/19%/19%,
> which is within their SLA.
>
> That's not to say that hard limits can't be useful in their own right
> - e.g. for providing reproducible loadtesting conditions by
> controlling how much CPU a service can use during the load test. But I
> don't see why using them to implement guarantees is either necessary
> or desirable.
>
> (Unless I'm missing some crucial point ...)

The important scenario I have is adding and removing groups.

Consider 10 cgroups with shares of 10 each, what if 5 new are created
with the same shares? We now start getting 100/15, even though we did
not change our shares.

--
Balbir

2009-06-05 09:48:32

by Dhaval Giani

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 02:32:51AM -0700, Paul Menage wrote:
> On Fri, Jun 5, 2009 at 2:27 AM, Bharata B Rao<[email protected]> wrote:
> >>
> >> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
> >> give each cgroup an equal share, and they're guaranteed 10% if they
> >> try to use it; if they don't use it, other cgroups can get access to
> >> the idle cycles.
> >
> > Now if 11th group with same shares comes in, then each group will now
> > get 9% of CPU and that 10% guarantee breaks.
>
> So you're trying to guarantee 11 cgroups that they can each get 10% of
> the CPU? That's called over-committing, and while there's nothing
> wrong with doing that if you're confident that they'll not al need
> their 10% at the same time, there's no way to *guarantee* them all
> 10%. You can guarantee them all 9% and hope the extra 1% is spare for
> those that need it (over-committing), or you can guarantee 10 of them
> 10% and give the last one 0 shares.
>
> How would you propose to guarantee 11 cgroups each 10% of the CPU
> using hard limits?
>

You cannot guarantee 10% to 11 groups on any system (unless I am missing
something). The sum of guarantees cannot exceed 100%.

How would you be able to do that with any other mechanism?

Thanks,
--
regards,
Dhaval

2009-06-05 09:48:48

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 2:36 AM, Balbir Singh<[email protected]> wrote:
>
> The important scenario I have is adding and removing groups.
>
> Consider 10 cgroups with shares of 10 each, what if 5 new are created
> with the same shares? We now start getting 100/15, even though we did
> not change our shares.

Are you assuming that arbitrary users can create new cgroups whenever
they like, with whatever shares they like? In that situation, how
would you use hard limits to provide guarantees? Presumably if the
user could create a cgroup with an arbitrary share, they could create
one with an arbitrary hard limit too.

Can you explain a bit more about how you're envisaging cgroups being
created, and how their shares and hard limits would get set? I was
working on the assumption that (for any sub-tree of the CFS hierarchy)
there's a single managing entity that gets to decide the shares given
to the cgroups within that tree. That managing entity would be
responsible for ensuring that the shares given out allowed guarantees
to be met (or alternatively, that the probability of violating those
guarantees based on the shares given out was within some tolerance
threshold).

Paul

2009-06-05 09:51:27

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 2:48 AM, Dhaval Giani<[email protected]> wrote:
>> > Now if 11th group with same shares comes in, then each group will now
>> > get 9% of CPU and that 10% guarantee breaks.
>>
>> So you're trying to guarantee 11 cgroups that they can each get 10% of
>> the CPU? That's called over-committing, and while there's nothing
>> wrong with doing that if you're confident that they'll not all need
>> their 10% at the same time, there's no way to *guarantee* them all
>> 10%. You can guarantee them all 9% and hope the extra 1% is spare for
>> those that need it (over-committing), or you can guarantee 10 of them
>> 10% and give the last one 0 shares.
>>
>> How would you propose to guarantee 11 cgroups each 10% of the CPU
>> using hard limits?
>>
>
> You cannot guarantee 10% to 11 groups on any system (unless I am missing
> something). The sum of guarantees cannot exceed 100%.

That's exactly my point. I was trying to counter Bharata's statement, which was:

> > Now if 11th group with same shares comes in, then each group will now
> > get 9% of CPU and that 10% guarantee breaks.

which seemed to be implying that this was a drawback of using shares
to implement guarantees.

Paul

2009-06-05 09:56:59

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

* [email protected] <[email protected]> [2009-06-05 02:48:36]:

> On Fri, Jun 5, 2009 at 2:36 AM, Balbir Singh<[email protected]> wrote:
> >
> > The important scenario I have is adding and removing groups.
> >
> > Consider 10 cgroups with shares of 10 each, what if 5 new are created
> > with the same shares? We now start getting 100/15, even though we did
> > not change our shares.
>
> Are you assuming that arbitrary users can create new cgroups whenever
> they like, with whatever shares they like? In that situation, how
> would you use hard limits to provide guarantees? Presumably if the
> user could create a cgroup with an arbitrary share, they could create
> one with an arbitrary hard limit too.
>

What about applications running as root, that can create their own
groups? How about multiple instances of the same application started?
Do applications need to know that creating a group will hurt
guarantees provided to others?

> Can you explain a bit more about how you're envisaging cgroups being
> created, and how their shares and hard limits would get set? I was
> working on the assumption that (for any sub-tree of the CFS hierarchy)
> there's a single managing entity that gets to decide the shares given
> to the cgroups within that tree. That managing entity would be
> responsible for ensuring that the shares given out allowed guarantees
> to be met (or alternatively, that the probability of violating those
> guarantees based on the shares given out was within some tolerance
> threshold).
>

The point is that there is no single control entity for creating
groups. if run a solution, it might create groups without telling the
user. No one is arbitrating, not even libcgroup. What if someone
changes the cpuset assignment and moves CPUS x to y in an exclusive
cpuset all of a sudden. How do we arbitrate?


--
Balbir

2009-06-05 09:58:00

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 2:55 AM, Balbir Singh<[email protected]> wrote:
> The point is that there is no single control entity for creating
> groups. if run a solution, it might create groups without telling the
> user. No one is arbitrating, not even libcgroup. What if someone
> changes the cpuset assignment and moves CPUS x to y in an exclusive
> cpuset all of a sudden. How do we arbitrate?

But in that situation, how do hard limits help? If you can't control
when cgroups are being created, and you can't control their shares,
how are you going to control their hard limits?

Paul

2009-06-05 10:00:03

by Dhaval Giani

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 02:51:18AM -0700, Paul Menage wrote:
> On Fri, Jun 5, 2009 at 2:48 AM, Dhaval Giani<[email protected]> wrote:
> >> > Now if 11th group with same shares comes in, then each group will now
> >> > get 9% of CPU and that 10% guarantee breaks.
> >>
> >> So you're trying to guarantee 11 cgroups that they can each get 10% of
> >> the CPU? That's called over-committing, and while there's nothing
> >> wrong with doing that if you're confident that they'll not all need
> >> their 10% at the same time, there's no way to *guarantee* them all
> >> 10%. You can guarantee them all 9% and hope the extra 1% is spare for
> >> those that need it (over-committing), or you can guarantee 10 of them
> >> 10% and give the last one 0 shares.
> >>
> >> How would you propose to guarantee 11 cgroups each 10% of the CPU
> >> using hard limits?
> >>
> >
> > You cannot guarantee 10% to 11 groups on any system (unless I am missing
> > something). The sum of guarantees cannot exceed 100%.
>
> That's exactly my point. I was trying to counter Bharata's statement, which was:
>
> > > Now if 11th group with same shares comes in, then each group will now
> > > get 9% of CPU and that 10% guarantee breaks.
>
> which seemed to be implying that this was a drawback of using shares
> to implement guarantees.
>

OK :). Glad to see I did not get it wrong.

I think we are focusing on the wrong use case here. Guarantees is just a
useful side-effect we get by using hard limits. I think the more
important use case is where the provider wants to limit the amount of
time a user gets (such as in a cloud).

Maybe we should direct our attention in solving that problem? :)

thanks,
--
regards,
Dhaval

2009-06-05 10:02:59

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 2:55 AM, Balbir Singh<[email protected]> wrote:
>
> What about applications running as root, that can create their own
> groups? How about multiple instances of the same application started?
> Do applications need to know that creating a group will hurt
> guarantees provided to others?

Yes, of course. If you're handing out guarantees, but other users
can/will create cgroups with whatever parameters they like and won't
respect the guarantees that you've made, then those guarantees are
worthless. How do hard limits help in that situation?

Paul

2009-06-05 10:03:31

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 2:59 AM, Dhaval Giani<[email protected]> wrote:
>
> I think we are focusing on the wrong use case here. Guarantees is just a
> useful side-effect we get by using hard limits. I think the more
> important use case is where the provider wants to limit the amount of
> time a user gets (such as in a cloud).
>
> Maybe we should direct our attention in solving that problem? :)
>

Yes, that case and the "predictable load test behaviour" case are both
good reasons for hard limits.

Paul

2009-06-05 11:32:34

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 01:53:15AM -0700, Paul Menage wrote:
> This claim (and the subsequent long thread it generated on how limits
> can provide guarantees) confused me a bit.
>
> Why do we need limits to provide guarantees when we can already
> provide guarantees via shares?

I think the interval over which we need guarantee matters here. Shares
can generally provide guaranteed share of resource over longer (sometimes
minutes) intervals. For high-priority bursty workloads, the latency in
achieving guaranteed resource usage matters. By having hard-limits, we are
"reserving" (potentially idle) slots where the high-priority group can run and
claim its guaranteed share almost immediately.

- vatsa

2009-06-05 12:18:27

by Paul Menage

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 4:32 AM, Srivatsa Vaddagiri<[email protected]> wrote:
>
> I think the interval over which we need guarantee matters here. Shares
> can generally provide guaranteed share of resource over longer (sometimes
> minutes) intervals. For high-priority bursty workloads, the latency in
> achieving guaranteed resource usage matters.

Well yes, it's true that you *could* just enforce shares over a
granularity of minutes, and limits over a granularity of milliseconds.
But why would you? It could well make sense that you can adjust the
granularity over which shares are enforced - e.g. for batch jobs, only
enforcing over minutes or tens of seconds might be fine. But if you're
doing the fine-grained accounting and scheduling required for the
tight hard limit enforcement, it doesn't seem as though it should be
much harder to enforce shares at the same granularity for those
cgroups that matter. In fact I thought that's what CFS already did -
updated the virtual time accounting at each context switch, and picked
the runnable child with the oldest virtual time. (Maybe someone like
Ingo or Peter who's more familiar than I with the CFS implementation
could comment here?)

> By having hard-limits, we are
> "reserving" (potentially idle) slots where the high-priority group can run and
> claim its guaranteed share almost immediately.

But you can always create an "idle" slot by forcibly preempting
whatever's running currently when you need to - you don't need to keep
the CPU deliberately idle just in case a cgroup with a guarantee wakes
up.

Paul

2009-06-05 12:59:44

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Bharata B Rao wrote:
>> So the groups with guarantees get a priority boost. That's not a good
>> side effect.
>>
>
> That happens only in the presence of idle cycles when other groups [with or
> without guarantees] have nothing useful to do. So how would that matter
> since there is nothing else to run anyway ?
>

If there are three groups, each running a cpu hog, and they have (say)
guarantees of 10%, 10%, and 0%, then they should each get 33% of the
cpu, not biased towards the groups with the guarantee.

If I want to change the weights, I'll alter their priority.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 13:04:21

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Paul Menage wrote:
> On Wed, Jun 3, 2009 at 10:36 PM, Bharata B
> Rao<[email protected]> wrote:
>
>> - Hard limits can be used to provide guarantees.
>>
>>
>
> This claim (and the subsequent long thread it generated on how limits
> can provide guarantees) confused me a bit.
>
> Why do we need limits to provide guarantees when we can already
> provide guarantees via shares?
>
> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
> give each cgroup an equal share, and they're guaranteed 10% if they
> try to use it; if they don't use it, other cgroups can get access to
> the idle cycles.
>
> Suppose cgroup A wants a guarantee of 50% and two others, B and C,
> want guarantees of 15% each; give A 50 shares and B and C 15 shares
> each. In this case, if they all run flat out they'll get 62%/19%/19%,
> which is within their SLA.
>
> That's not to say that hard limits can't be useful in their own right
> - e.g. for providing reproducible loadtesting conditions by
> controlling how much CPU a service can use during the load test. But I
> don't see why using them to implement guarantees is either necessary
> or desirable.
>
> (Unless I'm missing some crucial point ...)
>

How many shares does a cgroup with a 0% guarantee get?

Ideally, the scheduler would hand out cpu time according to weight and
demand, then clamp over-demand by a cgroup's limit and boost the share
to meet guarantees.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 13:17:08

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:
>> That's the limit part. I'd like to be able to specify limits and
>> guarantees on the same host and for the same groups; I don't think that
>> works when you advance the bandwidth period.
>>
>
> Yes, this feature needs to be configurable. But your use case for both
> limits and guarantees is interesting. We spoke to Peter and he was
> convinced only of the guarantee use case. Could you please help
> elaborate your use case, so that we can incorporate it into RFC v2 we
> send out. Peter is opposed to having hard limits and is convinced that
> they are not generally useful, so far I seen you and Paul say it is
> useful, any arguments you have or any +1 from you will help us. Peter
> I am not back stabbing you :)
>

I am selling virtual private servers. A 10% cpu share costs $x/month,
and I guarantee you'll get that 10%, or your money back. On the other
hand, I want to limit cpu usage to that 10% (maybe a little more) so
people don't buy 10% shares and use 100% on my underutilized servers.
If they want 100%, let them pay for 100%.

>> I think we need to treat guarantees as first-class goals, not something
>> derived from limits (in fact I think guarantees are more useful as they
>> can be used to provide SLAs).
>>
>
> Even limits are useful for SLA's since your b/w available changes
> quite drastically as we add or remove groups. There are other use
> cases for limits as well

SLAs are specified in terms of guarantees on a service, not on limits on
others. If we could use limits to provide guarantees, that would be
fine, but it doesn't quite work out.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-06-05 13:43:44

by Dhaval Giani

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 04:02:11PM +0300, Avi Kivity wrote:
> Paul Menage wrote:
>> On Wed, Jun 3, 2009 at 10:36 PM, Bharata B
>> Rao<[email protected]> wrote:
>>
>>> - Hard limits can be used to provide guarantees.
>>>
>>>
>>
>> This claim (and the subsequent long thread it generated on how limits
>> can provide guarantees) confused me a bit.
>>
>> Why do we need limits to provide guarantees when we can already
>> provide guarantees via shares?
>>
>> Suppose 10 cgroups each want 10% of the machine's CPU. We can just
>> give each cgroup an equal share, and they're guaranteed 10% if they
>> try to use it; if they don't use it, other cgroups can get access to
>> the idle cycles.
>>
>> Suppose cgroup A wants a guarantee of 50% and two others, B and C,
>> want guarantees of 15% each; give A 50 shares and B and C 15 shares
>> each. In this case, if they all run flat out they'll get 62%/19%/19%,
>> which is within their SLA.
>>
>> That's not to say that hard limits can't be useful in their own right
>> - e.g. for providing reproducible loadtesting conditions by
>> controlling how much CPU a service can use during the load test. But I
>> don't see why using them to implement guarantees is either necessary
>> or desirable.
>>
>> (Unless I'm missing some crucial point ...)
>>
>
> How many shares does a cgroup with a 0% guarantee get?
>

Shares cannot be used to provide guarantees. All they decide is what
propotion groups can get CPU time. (yes, shares is a bad name, weight
shows the intent better).

thanks,
--
regards,
Dhaval

2009-06-05 13:49:59

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 5, 2009 at 6:44 PM, Avi Kivity<[email protected]> wrote:
> Balbir Singh wrote:
>>>
>>> That's the limit part. ?I'd like to be able to specify limits and
>>> ?guarantees on the same host and for the same groups; I don't think that
>>> ?works when you advance the bandwidth period.
>>>
>>
>> Yes, this feature needs to be configurable. But your use case for both
>> limits and guarantees is interesting. We spoke to Peter and he was
>> convinced only of the guarantee use case. Could you please help
>> elaborate your use case, so that we can incorporate it into RFC v2 we
>> send out. Peter is opposed to having hard limits and is convinced that
>> they are not generally useful, so far I seen you and Paul say it is
>> useful, any arguments you have or any +1 from you will help us. Peter
>> I am not back stabbing you :)
>>
>
> I am selling virtual private servers. ?A 10% cpu share costs $x/month, and I
> guarantee you'll get that 10%, or your money back. ?On the other hand, I
> want to limit cpu usage to that 10% (maybe a little more) so people don't
> buy 10% shares and use 100% on my underutilized servers. ?If they want 100%,
> let them pay for 100%.

Excellent examples, we've covered them in the RFC, could you see if we
missed anything in terms of use cases? The real question is do we care
enough to build hard limits control into the CFS group scheduler. I
believe we should.

>
>>> I think we need to treat guarantees as first-class goals, not something
>>> ?derived from limits (in fact I think guarantees are more useful as they
>>> ?can be used to provide SLAs).
>>>
>>
>> Even limits are useful for SLA's since your b/w available changes
>> quite drastically as we add or remove groups. There are other use
>> cases for limits as well
>
> SLAs are specified in terms of guarantees on a service, not on limits on
> others. ?If we could use limits to provide guarantees, that would be fine,
> but it doesn't quite work out.

To be honest, I would disagree here, specifically if you start
comparing how you would build guarantees in the kernel and compare it
with the proposed approach. I don't want to harp on the technicality,
but point out the feasibility for people who care for lower end of the
guarantee without requiring density. I think the real technical
discussion should be on here are the use cases, lets agree on the need
for the feature and go ahead and start prototyping the feature.

Thanks,
Balbir

2009-06-05 14:44:31

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Srivatsa Vaddagiri wrote:
> On Fri, Jun 05, 2009 at 01:53:15AM -0700, Paul Menage wrote:
>> This claim (and the subsequent long thread it generated on how limits
>> can provide guarantees) confused me a bit.
>>
>> Why do we need limits to provide guarantees when we can already
>> provide guarantees via shares?
>
> I think the interval over which we need guarantee matters here. Shares
> can generally provide guaranteed share of resource over longer (sometimes
> minutes) intervals. For high-priority bursty workloads, the latency in
> achieving guaranteed resource usage matters. By having hard-limits, we are
> "reserving" (potentially idle) slots where the high-priority group can run and
> claim its guaranteed share almost immediately.

Why do you need to "reserve" it though? By definition, if it's
high-priority then it should be able to interrupt the currently running
task.

Chris

2009-06-05 14:46:08

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Dhaval Giani wrote:

> Shares cannot be used to provide guarantees. All they decide is what
> propotion groups can get CPU time. (yes, shares is a bad name, weight
> shows the intent better).

If I (as the administrator of the system) arbitrarily decide that all
the shares/weights must add up to 100, they magically become percentage
guarantees.

Chris

2009-06-05 14:54:29

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Avi Kivity wrote:

> I am selling virtual private servers. A 10% cpu share costs $x/month,
> and I guarantee you'll get that 10%, or your money back. On the other
> hand, I want to limit cpu usage to that 10% (maybe a little more) so
> people don't buy 10% shares and use 100% on my underutilized servers.
> If they want 100%, let them pay for 100%.

What about taking a page from the networking folks and specifying cpu
like a networking SLA?

Something like "group A is guaranteed X percent (or share) of the cpu,
but it is allowed to burst up to Y percent for Z milliseconds"

If a rule of this form was the first-class citizen, it would provide
both guarantees, limits, and flexible behaviour.

Chris

2009-06-07 06:06:57

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Bharata B Rao wrote:
> On Fri, Jun 05, 2009 at 09:01:50AM +0300, Avi Kivity wrote:
>
>> Bharata B Rao wrote:
>>
>>> But could there be client models where you are required to strictly
>>> adhere to the limit within the bandwidth and not provide more (by advancing
>>> the bandwidth period) in the presence of idle cycles ?
>>>
>>>
>> That's the limit part. I'd like to be able to specify limits and
>> guarantees on the same host and for the same groups; I don't think that
>> works when you advance the bandwidth period.
>>
>> I think we need to treat guarantees as first-class goals, not something
>> derived from limits (in fact I think guarantees are more useful as they
>> can be used to provide SLAs).
>>
>
> I agree that guarantees are important, but I am not sure about
>
> 1. specifying both limits and guarantees for groups and
>

Why would you allow specifying a lower bound for cpu usage (a
guarantee), and upper bound (a limit), but not both?

> 2. not deriving guarantees from limits.
>
> Guarantees are met by some form of throttling or limiting and hence I think
> limiting should drive the guarantees

That would be fine if it didn't idle the cpu despite there being demand
and available cpu power.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-06-07 06:11:01

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Balbir Singh wrote:
>> I am selling virtual private servers. A 10% cpu share costs $x/month, and I
>> guarantee you'll get that 10%, or your money back. On the other hand, I
>> want to limit cpu usage to that 10% (maybe a little more) so people don't
>> buy 10% shares and use 100% on my underutilized servers. If they want 100%,
>> let them pay for 100%.
>>
>
> Excellent examples, we've covered them in the RFC, could you see if we
> missed anything in terms of use cases? The real question is do we care
> enough to build hard limits control into the CFS group scheduler. I
> believe we should.
>

You only cover the limit part. Guarantees are left as an exercise to
the reader.

I don't think implementing guarantees via limits is workable as it
causes the cpu to be idled unnecessarily.

>>> Even limits are useful for SLA's since your b/w available changes
>>> quite drastically as we add or remove groups. There are other use
>>> cases for limits as well
>>>
>> SLAs are specified in terms of guarantees on a service, not on limits on
>> others. If we could use limits to provide guarantees, that would be fine,
>> but it doesn't quite work out.
>>
>
> To be honest, I would disagree here, specifically if you start
> comparing how you would build guarantees in the kernel and compare it
> with the proposed approach. I don't want to harp on the technicality,
> but point out the feasibility for people who care for lower end of the
> guarantee without requiring density. I think the real technical
> discussion should be on here are the use cases, lets agree on the need
> for the feature and go ahead and start prototyping the feature.
>

I don't understand. Are you saying implementing guarantees is too complex?

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-06-07 06:12:15

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Chris Friesen wrote:
> Avi Kivity wrote:
>
>
>> I am selling virtual private servers. A 10% cpu share costs $x/month,
>> and I guarantee you'll get that 10%, or your money back. On the other
>> hand, I want to limit cpu usage to that 10% (maybe a little more) so
>> people don't buy 10% shares and use 100% on my underutilized servers.
>> If they want 100%, let them pay for 100%.
>>
>
> What about taking a page from the networking folks and specifying cpu
> like a networking SLA?
>
> Something like "group A is guaranteed X percent (or share) of the cpu,
> but it is allowed to burst up to Y percent for Z milliseconds"
>
> If a rule of this form was the first-class citizen, it would provide
> both guarantees, limits, and flexible behaviour.
>

I think you're introducing a new control (guarantees, limits, burst
limit), but I like it.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2009-06-07 10:13:19

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Fri, Jun 05, 2009 at 05:18:13AM -0700, Paul Menage wrote:
> Well yes, it's true that you *could* just enforce shares over a
> granularity of minutes, and limits over a granularity of milliseconds.
> But why would you? It could well make sense that you can adjust the
> granularity over which shares are enforced - e.g. for batch jobs, only
> enforcing over minutes or tens of seconds might be fine. But if you're
> doing the fine-grained accounting and scheduling required for the
> tight hard limit enforcement, it doesn't seem as though it should be
> much harder to enforce shares at the same granularity for those
> cgroups that matter. In fact I thought that's what CFS already did -
> updated the virtual time accounting at each context switch, and picked
> the runnable child with the oldest virtual time. (Maybe someone like
> Ingo or Peter who's more familiar than I with the CFS implementation
> could comment here?)

Using shares to guarantee resources over short period (<2-3 seconds) works
just well on a single CPU. The complexity is with multi-cpu case, where CFS can
take a long time to converge to a fair point. This is because fairness is based
on rebalancing tasks equally across all CPUs.

For something like 4 tasks on 4 CPUs, it will converge pretty quickly
(2-3 seconds):

[top o/p refreshed every 2sec on 2.6.30-rc5-tip]

14753 vatsa 20 0 63812 1072 924 R 99.9 0.0 0:39.54 hog
14754 vatsa 20 0 63812 1072 924 R 99.9 0.0 0:38.69 hog
14756 vatsa 20 0 63812 1076 924 R 99.9 0.0 0:38.27 hog
14755 vatsa 20 0 63812 1072 924 R 99.6 0.0 0:38.27 hog

whereas for something like 5 tasks on 4 CPUs, it will take a sufficiently
longer time (>30 seconds)

[top o/p refreshed every 2sec]:

14754 vatsa 20 0 63812 1072 924 R 86.0 0.0 2:06.45 hog
14766 vatsa 20 0 63812 1072 924 R 83.0 0.0 0:07.95 hog
14756 vatsa 20 0 63812 1076 924 R 81.7 0.0 2:06.48 hog
14753 vatsa 20 0 63812 1072 924 R 78.7 0.0 2:07.10 hog
14755 vatsa 20 0 63812 1072 924 R 69.4 0.0 2:05.62 hog

[top o/p refreshed every 120sec]:

14766 vatsa 20 0 63812 1072 924 R 90.1 0.0 5:57.22 hog
14755 vatsa 20 0 63812 1072 924 R 84.8 0.0 8:01.61 hog
14754 vatsa 20 0 63812 1072 924 R 77.3 0.0 7:52.04 hog
14753 vatsa 20 0 63812 1072 924 R 74.1 0.0 7:29.01 hog
14756 vatsa 20 0 63812 1076 924 R 73.5 0.0 7:34.69 hog

[Note that even over 2min, we haven't achieved perfect fairness]

> > By having hard-limits, we are
> > "reserving" (potentially idle) slots where the high-priority group can run and
> > claim its guaranteed share almost immediately.

On further thinking, this is not as simple as that. In above example of
5 tasks on 4 CPUs, we could cap each task at a hard limit of 80%
(4 CPUs/5 tasks), which is still not sufficient to ensure that each
task gets the perfect fairness of 80%! Not just that, hard-limit
for a group (on each CPU) will have to be adjusted based on its task
distribution. For ex: a group that has a hard-limit of 25% on a 4-cpu
system and that has a single task, is entitled to claim a whole CPU. So
the per-cpu hard-limit for the group should be 100% on whatever CPU the
task is running. This adjustment of per-cpu hard-limit should happen
whenever the task distribution of the group across CPUs change - which
in theory would require you to monitor every task exit/migration
event and readjust limits, making it very complex and high-overhead.

Balbir,
I dont think guarantee can be met easily thr' hard-limits in
case of CPU resource. Atleast its not as straightforward as in case of
memory!

- vatsa

2009-06-07 15:35:34

by Balbir Singh

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Sun, Jun 7, 2009 at 3:41 PM, Srivatsa Vaddagiri<[email protected]> wrote:
> On Fri, Jun 05, 2009 at 05:18:13AM -0700, Paul Menage wrote:
>> Well yes, it's true that you *could* just enforce shares over a
>> granularity of minutes, and limits over a granularity of milliseconds.
>> But why would you? It could well make sense that you can adjust the
>> granularity over which shares are enforced - e.g. for batch jobs, only
>> enforcing over minutes or tens of seconds might be fine. But if you're
>> doing the fine-grained accounting and scheduling required for the
>> tight hard limit enforcement, it doesn't seem as though it should be
>> much harder to enforce shares at the same granularity for those
>> cgroups that matter. In fact I thought that's what CFS already did -
>> updated the virtual time accounting at each context switch, and picked
>> the runnable child with the oldest virtual time. (Maybe someone like
>> Ingo or Peter who's more familiar than I with the CFS implementation
>> could comment here?)
>
> Using shares to guarantee resources over short period (<2-3 seconds) works
> just well on a single CPU. The complexity is with multi-cpu case, where CFS can
> take a long time to converge to a fair point. This is because fairness is based
> on rebalancing tasks equally across all CPUs.
>
> For something like 4 tasks on 4 CPUs, it will converge pretty quickly
> (2-3 seconds):
>
> [top o/p refreshed every 2sec on 2.6.30-rc5-tip]
>
> 14753 vatsa ? ? 20 ? 0 63812 1072 ?924 R 99.9 ?0.0 ? 0:39.54 hog
> 14754 vatsa ? ? 20 ? 0 63812 1072 ?924 R 99.9 ?0.0 ? 0:38.69 hog
> 14756 vatsa ? ? 20 ? 0 63812 1076 ?924 R 99.9 ?0.0 ? 0:38.27 hog
> 14755 vatsa ? ? 20 ? 0 63812 1072 ?924 R 99.6 ?0.0 ? 0:38.27 hog
>
> whereas for something like 5 tasks on 4 CPUs, it will take a sufficiently
> longer time (>30 seconds)
>
> [top o/p refreshed every 2sec]:
>
> 14754 vatsa ? ? 20 ? 0 63812 1072 ?924 R 86.0 ?0.0 ? 2:06.45 hog
> 14766 vatsa ? ? 20 ? 0 63812 1072 ?924 R 83.0 ?0.0 ? 0:07.95 hog
> 14756 vatsa ? ? 20 ? 0 63812 1076 ?924 R 81.7 ?0.0 ? 2:06.48 hog
> 14753 vatsa ? ? 20 ? 0 63812 1072 ?924 R 78.7 ?0.0 ? 2:07.10 hog
> 14755 vatsa ? ? 20 ? 0 63812 1072 ?924 R 69.4 ?0.0 ? 2:05.62 hog
>
> [top o/p refreshed every 120sec]:
>
> 14766 vatsa ? ? 20 ? 0 63812 1072 ?924 R 90.1 ?0.0 ? 5:57.22 hog
> 14755 vatsa ? ? 20 ? 0 63812 1072 ?924 R 84.8 ?0.0 ? 8:01.61 hog
> 14754 vatsa ? ? 20 ? 0 63812 1072 ?924 R 77.3 ?0.0 ? 7:52.04 hog
> 14753 vatsa ? ? 20 ? 0 63812 1072 ?924 R 74.1 ?0.0 ? 7:29.01 hog
> 14756 vatsa ? ? 20 ? 0 63812 1076 ?924 R 73.5 ?0.0 ? 7:34.69 hog
>
> [Note that even over 2min, we haven't achieved perfect fairness]
>

Good observation, Thanks!

>> > By having hard-limits, we are
>> > "reserving" (potentially idle) slots where the high-priority group can run and
>> > claim its guaranteed share almost immediately.
>
> On further thinking, this is not as simple as that. In above example of
> 5 tasks on 4 CPUs, we could cap each task at a hard limit of 80%
> (4 CPUs/5 tasks), which is still not sufficient to ensure that each
> task gets the perfect fairness of 80%! Not just that, hard-limit
> for a group (on each CPU) will have to be adjusted based on its task
> distribution. For ex: a group that has a hard-limit of 25% on a 4-cpu
> system and that has a single task, is entitled to claim a whole CPU. So
> the per-cpu hard-limit for the group should be 100% on whatever CPU the
> task is running. This adjustment of per-cpu hard-limit should happen
> whenever the task distribution of the group across CPUs change - which
> in theory would require you to monitor every task exit/migration
> event and readjust limits, making it very complex and high-overhead.
>

We already do that for shares right? I mean instead of 25% hard limit,
if the group had 25% of the shares the same thing would apply - no?

> Balbir,
> ? ? ? ?I dont think guarantee can be met easily thr' hard-limits in
> case of CPU resource. Atleast its not as straightforward as in case of
> memory!

OK, based on the discussion - leaving implementation issues out,
speaking of whether it is possible to implement guarantees using
shares? My answer would be

1. Yes - but then the hard limits will prevent you and can cause idle
times, some of those can be handled in the implementation. There might
be other fairness and SMP concerns about the accuracy of the fairness,
thank you for that data.
2. We'll update the RFC (second version) with the findings and send it
out, so that the expectations are clearer
3. From what I've read and seen there seems to be no strong objection
to hard limits, but some reservations (based on 1) about using them
for guarantees and our RFC will reflect that.

Do you agree?
Balbir

Balbir

2009-06-07 16:15:59

by Bharata B Rao

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Sun, Jun 07, 2009 at 09:04:49AM +0300, Avi Kivity wrote:
> Bharata B Rao wrote:
>> On Fri, Jun 05, 2009 at 09:01:50AM +0300, Avi Kivity wrote:
>>
>>> Bharata B Rao wrote:
>>>
>>>> But could there be client models where you are required to strictly
>>>> adhere to the limit within the bandwidth and not provide more (by advancing
>>>> the bandwidth period) in the presence of idle cycles ?
>>>>
>>> That's the limit part. I'd like to be able to specify limits and
>>> guarantees on the same host and for the same groups; I don't think
>>> that works when you advance the bandwidth period.
>>>
>>> I think we need to treat guarantees as first-class goals, not
>>> something derived from limits (in fact I think guarantees are more
>>> useful as they can be used to provide SLAs).
>>>
>>
>> I agree that guarantees are important, but I am not sure about
>>
>> 1. specifying both limits and guarantees for groups and
>>
>
> Why would you allow specifying a lower bound for cpu usage (a
> guarantee), and upper bound (a limit), but not both?

I was saying that we specify only limits and not guarantees since it
can be worked out from limits. Initial thinking was that the kernel will
be made aware of only limits and users could set the limits appropriately
to obtain the desired guarantees. I understand your concerns/objections
on this and we will address this in our next version of RFC as Balbir said.

Regards,
Bharata.

2009-06-08 04:37:29

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

On Sun, Jun 07, 2009 at 09:05:23PM +0530, Balbir Singh wrote:
> > On further thinking, this is not as simple as that. In above example of
> > 5 tasks on 4 CPUs, we could cap each task at a hard limit of 80%
> > (4 CPUs/5 tasks), which is still not sufficient to ensure that each
> > task gets the perfect fairness of 80%! Not just that, hard-limit
> > for a group (on each CPU) will have to be adjusted based on its task
> > distribution. For ex: a group that has a hard-limit of 25% on a 4-cpu
> > system and that has a single task, is entitled to claim a whole CPU. So
> > the per-cpu hard-limit for the group should be 100% on whatever CPU the
> > task is running. This adjustment of per-cpu hard-limit should happen
> > whenever the task distribution of the group across CPUs change - which
> > in theory would require you to monitor every task exit/migration
> > event and readjust limits, making it very complex and high-overhead.
> >
>
> We already do that for shares right? I mean instead of 25% hard limit,
> if the group had 25% of the shares the same thing would apply - no?

yes and no. we do rebalance shares based on task distribution, but not
upon every task fork/exit/wakeup/migration event. Its done once in a while,
frequent enough to give "decent" fairness!

> > Balbir,
> > ? ? ? ?I dont think guarantee can be met easily thr' hard-limits in
> > case of CPU resource. Atleast its not as straightforward as in case of
> > memory!
>
> OK, based on the discussion - leaving implementation issues out,
> speaking of whether it is possible to implement guarantees using
> shares? My answer would be
>
> 1. Yes - but then the hard limits will prevent you and can cause idle
> times, some of those can be handled in the implementation. There might
> be other fairness and SMP concerns about the accuracy of the fairness,
> thank you for that data.
> 2. We'll update the RFC (second version) with the findings and send it
> out, so that the expectations are clearer
> 3. From what I've read and seen there seems to be no strong objection
> to hard limits, but some reservations (based on 1) about using them
> for guarantees and our RFC will reflect that.
>
> Do you agree?

Well yes, guarantee is not a good argument for providing hard limits.
Pay-per-use kind of usage would be a better argument IMHO.

- vatsa

2009-06-08 08:51:59

by Pavel Emelyanov

[permalink] [raw]
Subject: Re: [RFC] CPU hard limits

Paul Menage wrote:
> On Fri, Jun 5, 2009 at 2:59 AM, Dhaval Giani<[email protected]> wrote:
>> I think we are focusing on the wrong use case here. Guarantees is just a
>> useful side-effect we get by using hard limits. I think the more
>> important use case is where the provider wants to limit the amount of
>> time a user gets (such as in a cloud).
>>
>> Maybe we should direct our attention in solving that problem? :)
>>
>
> Yes, that case and the "predictable load test behaviour" case are both
> good reasons for hard limits.

ACK.

I'd like to add two things.

First, the article @openvz.org about guarantees you were discussing was
not supposed to be a "best practices" paper. This was just a theoretical
thoughts on how to get guarantees out of the limit for those resources
you cannot reclaim from the user and thus cannot provide the guarantee
any other way. E.g. locked memory - once a user has it you cannot take
it back, and if you want to guarantee some mount of it for group X you
have to keep all the other groups away from this amount.

And the second thing is an addition for Dhaval's case about limiting the
amount of time a user gets. This is exactly what hosting providers do -
they _sell_ the CPU power to their customers and thus need to limit the
CPU time dedicated for containers.

> Paul
>