From: Bharata B Rao <[email protected]>
Basic description of usage and effect for CFS Bandwidth Control.
Signed-off-by: Bharata B Rao <[email protected]>
Signed-off-by: Paul Turner <[email protected]>
---
Documentation/scheduler/sched-bwc.txt | 98
++++++++++++++++++++++++++++++++++
Documentation/scheduler/sched-bwc.txt | 104 ++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)
Index: tip/Documentation/scheduler/sched-bwc.txt
===================================================================
--- /dev/null
+++ tip/Documentation/scheduler/sched-bwc.txt
@@ -0,0 +1,104 @@
+CFS Bandwidth Control (aka CPU hard limits)
+===========================================
+
+[ This document talks about CPU bandwidth control of CFS groups only.
+ The bandwidth control of RT groups is explained in
+ Documentation/scheduler/sched-rt-group.txt ]
+
+CFS bandwidth control is a group scheduler extension that can be used to
+control the maximum CPU bandwidth obtained by a CPU cgroup.
+
+Bandwidth allowed for a group is specified using quota and period. Within
+a given "period" (microseconds), a group is allowed to consume up to "quota"
+microseconds of CPU time, which is the upper limit or the hard limit. When the
+CPU bandwidth consumption of a group exceeds the hard limit, the tasks in the
+group are throttled and are not allowed to run until the end of the period at
+which time the group's quota is replenished.
+
+Runtime available to the group is tracked globally. At the beginning of
+every period, group's global runtime pool is replenished with "quota"
+microseconds worth of runtime. The runtime consumption happens locally at each
+CPU by fetching runtimes in "slices" from the global pool.
+
+Interface
+---------
+Quota and period can be set via cgroup files.
+
+cpu.cfs_quota_us: the enforcement interval (microseconds)
+cpu.cfs_period_us: the maximum allowed bandwidth (microseconds)
+
+Within a period of cpu.cfs_period_us, the group as a whole will not be allowed
+to consume more than cpu_cfs_quota_us worth of runtime.
+
+The default value of cpu.cfs_period_us is 500ms and the default value
+for cpu.cfs_quota_us is -1.
+
+A group with cpu.cfs_quota_us as -1 indicates that the group has infinite
+bandwidth, which means that it is not bandwidth controlled.
+
+Writing any negative value to cpu.cfs_quota_us will turn the group into
+an infinite bandwidth group. Reading cpu.cfs_quota_us for an infinite
+bandwidth group will always return -1.
+
+System wide settings
+--------------------
+The amount of runtime obtained from global pool every time a CPU wants the
+group quota locally is controlled by a sysctl parameter
+sched_cfs_bandwidth_slice_us. The current default is 5ms. This can be changed
+by writing to /proc/sys/kernel/sched_cfs_bandwidth_slice_us.
+
+A quota hierarchy is defined to be consistent if the sum of child reservations
+does not exceed the bandwidth allocated to its parent. An entity with no
+explicit bandwidth reservation (e.g. no limit) is considered to inherit its
+parent's limits. This behavior may be managed using
+/proc/sys/kernel/sched_cfs_bandwidth_consistent
+
+Statistics
+----------
+cpu.stat file lists three different stats related to CPU bandwidth control.
+
+nr_periods: Number of enforcement intervals that have elapsed.
+nr_throttled: Number of times the group has been throttled/limited.
+throttled_time: The total time duration (in nanoseconds) for which the group
+remained throttled.
+
+These files are read-only.
+
+Hierarchy considerations
+------------------------
+Each group's bandwidth (quota and period) can be set independent of its
+parent or child groups. There are two ways in which a group can get
+throttled:
+
+- it consumed its quota within the period
+- it has quota left but the parent's quota is exhausted.
+
+In the 2nd case, even though the child has quota left, it will not be
+able to run since the parent itself is throttled. Similarly groups that are
+not bandwidth constrained might end up being throttled if any parent
+in their hierarchy is throttled.
+
+Examples
+--------
+1. Limit a group to 1 CPU worth of runtime.
+
+ If period is 500ms and quota is also 500ms, the group will get
+ 1 CPU worth of runtime every 500ms.
+
+ # echo 500000 > cpu.cfs_quota_us /* quota = 500ms */
+ # echo 500000 > cpu.cfs_period_us /* period = 500ms */
+
+2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine.
+
+ With 500ms period and 1000ms quota, the group can get 2 CPUs worth of
+ runtime every 500ms.
+
+ # echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
+ # echo 500000 > cpu.cfs_period_us /* period = 500ms */
+
+3. Limit a group to 20% of 1 CPU.
+
+ With 500ms period, 100ms quota will be equivalent to 20% of 1 CPU.
+
+ # echo 100000 > cpu.cfs_quota_us /* quota = 100ms */
+ # echo 500000 > cpu.cfs_period_us /* period = 500ms */
(2011/05/03 18:29), Paul Turner wrote:
> From: Bharata B Rao <[email protected]>
>
> Basic description of usage and effect for CFS Bandwidth Control.
>
> Signed-off-by: Bharata B Rao <[email protected]>
> Signed-off-by: Paul Turner <[email protected]>
> ---
Reviewed-by: Hidetoshi Seto <[email protected]>
Thank you very much for your great work, Paul!
I've run some test on this version and no problems so far
(other than minor bug pointed by 04/15).
Definitely things getting better.
I'll continue tests and let you know if there is something.
Thanks,
H.Seto
On Tue, May 10, 2011 at 12:29 AM, Hidetoshi Seto
<[email protected]> wrote:
> (2011/05/03 18:29), Paul Turner wrote:
>> From: Bharata B Rao <[email protected]>
>>
>> Basic description of usage and effect for CFS Bandwidth Control.
>>
>> Signed-off-by: Bharata B Rao <[email protected]>
>> Signed-off-by: Paul Turner <[email protected]>
>> ---
>
> Reviewed-by: Hidetoshi Seto <[email protected]>
>
> Thank you very much for your great work, Paul!
>
> I've run some test on this version and no problems so far
> (other than minor bug pointed by 04/15).
> Definitely things getting better.
>
> I'll continue tests and let you know if there is something.
>
Thank you for taking the time to review and test!
Very much appreciated!
>
> Thanks,
> H.Seto
>
>