2014-01-27 11:21:06

by Juri Lelli

[permalink] [raw]
Subject: [PATCH] sched/deadline: Add sched_dl documentation

From: Dario Faggioli <[email protected]>

Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dario Faggioli <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
[ Re-wrote sections 2 and 3. ]
Signed-off-by: Luca Abeni <[email protected]>
---
Documentation/scheduler/00-INDEX | 2 +
Documentation/scheduler/sched-deadline.txt | 281 ++++++++++++++++++++++++++++
kernel/sched/deadline.c | 3 +-
3 files changed, 285 insertions(+), 1 deletion(-)
create mode 100644 Documentation/scheduler/sched-deadline.txt

diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX
index d2651c4..46702e4 100644
--- a/Documentation/scheduler/00-INDEX
+++ b/Documentation/scheduler/00-INDEX
@@ -10,5 +10,7 @@ sched-nice-design.txt
- How and why the scheduler's nice levels are implemented.
sched-rt-group.txt
- real-time group scheduling.
+sched-deadline.txt
+ - deadline scheduling.
sched-stats.txt
- information on schedstats (Linux Scheduler Statistics).
diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
new file mode 100644
index 0000000..18adc92
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,281 @@
+ Deadline Task Scheduling
+ ------------------------
+
+CONTENTS
+========
+
+ 0. WARNING
+ 1. Overview
+ 2. Scheduling algorithm
+ 3. Scheduling Real-Time Tasks
+ 4. Bandwidth management
+ 4.1 System-wide settings
+ 4.2 Task interface
+ 4.3 Default behavior
+ 5. Tasks CPU affinity
+ 5.1 SCHED_DEADLINE and cpusets HOWTO
+ 6. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root users
+ know what they're doing.
+
+
+1. Overview
+===========
+
+ The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
+ basically an implementation of the Earliest Deadline First (EDF) scheduling
+ algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
+ that makes it possible to isolate the behavior of tasks between each other.
+
+
+2. Scheduling algorithm
+==================
+
+ SCHED_DEADLINE uses three parameters, named "runtime", "period", and
+ "deadline" to schedule tasks. A SCHED_DEADLINE task is guaranteed to receive
+ "runtime" microseconds of execution time every "period" microseconds, and
+ these "runtime" microseconds are available within "deadline" microseconds
+ from the beginning of the period. In order to implement this behaviour,
+ every time the task wakes up, the scheduler computes a "scheduling deadline"
+ consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
+ scheduled using EDF[1] on these scheduling deadlines (the task with the
+ smallest scheduling deadline is selected for execution). Notice that this
+ guaranteed is respected if a proper "admission control" strategy (see Section
+ "4. Bandwidth management") is used.
+
+ Summing up, the CBS[2,3] algorithms assigns scheduling deadlines to tasks so
+ that each task runs for at most its runtime every period, avoiding any
+ interference between different tasks (bandwidth isolation), while the EDF[1]
+ algorithm selects the task with the smallest scheduling deadline as the one
+ to be executed first. Thanks to this feature, also tasks that do not
+ strictly comply with the "traditional" real-time task model (see Section 3)
+ can effectively use the new policy.
+
+ In more details, the CBS algorithm assigns scheduling deadlines to
+ tasks in the following way:
+
+ - Each SCHED_DEADLINE task is characterised by the "runtime",
+ "deadline", and "period" parameters;
+
+ - The state of the task is described by a "scheduling deadline", and
+ a "current runtime". These two parameters are initially set to 0;
+
+ - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
+ the scheduler checks if
+
+ current runtime runtime
+ ---------------------------------- > ----------------
+ scheduling deadline - current time period
+
+ then, if the scheduling deadline is smaller than the current time, or
+ this condition is verified, the scheduling deadline and the
+ current budget are re-initialised as
+
+ scheduling deadline = current time + deadline
+ current runtime = runtime
+
+ otherwise, the scheduling deadline and the current runtime are
+ left unchanged;
+
+ - When a SCHED_DEADLINE task executes for an amount of time t, its
+ current runtime is decreased as
+
+ current runtime = current runtime - t
+
+ (technically, the runtime is decreased at every tick, or when the
+ task is descheduled / preempted);
+
+ - When the current runtime becomes less or equal than 0, the task is
+ said to be "throttled" (also known as "depleted" in real-time literature)
+ and cannot be scheduled until its scheduling deadline. The "replenishment
+ time" for this task (see next item) is set to be equal to the current
+ value of the scheduling deadline;
+
+ - When the current time is equal to the replenishment time of a
+ throttled task, the scheduling deadline and the current runtime are
+ updated as
+
+ scheduling deadline = scheduling deadline + period
+ current runtime = current runtime + runtime
+
+
+3. Scheduling Real-Time Tasks
+=============================
+
+ * BIG FAT WARNING ******************************************************
+ *
+ * This section contains a (not-thorough) summary on classical deadline
+ * scheduling theory, and how it applies to SCHED_DEADLINE.
+ * The reader can "safely" skip to Section 4 if only interested in seeing
+ * how the scheduling policy can be used. Anyway, we strongly recommend
+ * to come back here and continue reading (once the urge for testing is
+ * satisfied :P) to be sure of fully understanding all technical details.
+ ************************************************************************
+
+ There are no limitations on what kind of task can exploit this new
+ scheduling discipline, even if it must be said that it is particularly
+ suited for periodic or sporadic real-time tasks that need guarantees on their
+ timing behavior, e.g., multimedia, streaming, control applications, etc.
+
+ A typical real-time task is composed of a repetition of computation phases
+ (task instances, or jobs) which are activated on a periodic or sporadic
+ fashion.
+ Each job J_j (where J_j is the j^th job of the task) is characterised by an
+ arrival time r_j (the time when the job starts), an amount of computation
+ time c_j needed to finish the job, and a job absolute deadline d_j, which
+ is the time within which the job should be finished. The maximum execution
+ time max_j{c_j} is called "Worst Case Execution Time" (WCET) for the task.
+ A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
+ sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
+ d_j = r_j + D, where D is the task's relative deadline.
+
+ SCHED_DEADLINE can be used to schedule real-time tasks guaranteeing that
+ the jobs' deadlines of a task are respected. In order to do this, a task
+ must be scheduled by setting:
+
+ - runtime >= WCET
+ - deadline = D
+ - period <= P
+
+ IOW, if runtime >= WCET and if period is >= P, then the scheduling deadlines
+ and the absolute deadlines (d_j) coincide, so a proper admission control
+ allows to respect the jobs' absolute deadlines for this task (this is what is
+ called "hard schedulability property" and is an extension of Lemma 1 of [2]).
+
+ References:
+ 1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
+ ming in a hard-real-time environment. Journal of the Association for
+ Computing Machinery, 20(1), 1973.
+ 2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
+ Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
+ Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
+ 3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
+ Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
+
+4. Bandwidth management
+=======================
+
+ In order for the -deadline scheduling to be effective and useful, it is
+ important to have some method to keep the allocation of the available CPU
+ bandwidth to the tasks under control.
+ This is usually called "admission control" and if it is not performed at all,
+ no guarantee can be given on the actual scheduling of the -deadline tasks.
+
+ Since when RT-throttling has been introduced each task group has a bandwidth
+ associated, calculated as a certain amount of runtime over a period.
+ Moreover, to make it possible to manipulate such bandwidth, readable/writable
+ controls have been added to both procfs (for system wide settings) and cgroupfs
+ (for per-group settings).
+ Therefore, the same interface is being used for controlling the bandwidth
+ distrubution to -deadline tasks.
+
+ However, more discussion is needed in order to figure out how we want to manage
+ SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
+ uses (for now) a less sophisticated, but actually very sensible, mechanism to
+ ensure that a certain utilization cap is not overcome per each root_domain.
+
+ Another main difference between deadline bandwidth management and RT-throttling
+ is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
+ and thus we don't need an higher level throttling mechanism to enforce the
+ desired bandwidth.
+
+4.1 System wide settings
+------------------------
+
+ The system wide settings are configured under the /proc virtual file system.
+
+ For now the -rt knobs are used for dl admission control and the -deadline
+ runtime is accounted against the -rt runtime. We realise that this isn't
+ entirely desirable; however, it is better to have a small interface for now,
+ and be able to change it easily later. The ideal situation (see 5.) is to run
+ -rt tasks from a -deadline server; in which case the -rt bandwidth is a direct
+ subset of dl_bw.
+
+ This means that, for a root_domain comprising M CPUs, -deadline tasks
+ can be created while the sum of their bandwidths stays below:
+
+ M * (sched_rt_runtime_us / sched_rt_period_us)
+
+ It is also possible to disable this bandwidth management logic, and
+ be thus free of oversubscribing the system up to any arbitrary level.
+ This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
+
+
+4.2 Task interface
+------------------
+
+ Specifying a periodic/sporadic task that executes for a given amount of
+ runtime at each instance, and that is scheduled according to the urgency of
+ its own timing constraints needs, in general, a way of declaring:
+ - a (maximum/typical) instance execution time,
+ - a minimum interval between consecutive instances,
+ - a time constraint by which each instance must be completed.
+
+ Therefore:
+ * a new struct sched_attr, containing all the necessary fields is
+ provided;
+ * the new scheduling related syscalls that manipulate it, i.e.,
+ sched_setattr() and sched_getattr() are implemented.
+
+
+4.3 Default behavior
+---------------------
+
+ The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
+ 950000. With rt_period equal to 1000000, by default, it means that -deadline
+ tasks can use at most 95%, multiplied by the number of CPUs that compose the
+ root_domain, for each root_domain.
+
+ A -deadline task cannot fork.
+
+5. Tasks CPU affinity
+=====================
+
+ -deadline tasks cannot have an affinity mask smaller that the entire
+ root_domain they are created on. However, affinities can be specified
+ through the cpuset facility (Documentation/cgroups/cpusets.txt).
+
+5.1 SCHED_DEADLINE and cpusets HOWTO
+------------------------------------
+
+ An example of a simple configuration (pin a -deadline task to CPU0)
+ follows (rt-app is used to create a -deadline task).
+
+ mkdir /dev/cpuset
+ mount -t cgroup -o cpuset cpuset /dev/cpuset
+ cd /dev/cpuset
+ mkdir cpu0
+ echo 0 > cpu0/cpuset.cpus
+ echo 0 > cpu0/cpuset.mems
+ echo 1 > cpuset.cpu_exclusive
+ echo 0 > cpuset.sched_load_balance
+ echo 1 > cpu0/cpuset.cpu_exclusive
+ echo 1 > cpu0/cpuset.mem_exclusive
+ echo $$ > cpu0/tasks
+ rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
+ task affinity)
+
+6. Future plans
+===============
+
+ Still missing:
+
+ - refinements to deadline inheritance, especially regarding the possibility
+ of retaining bandwidth isolation among non-interacting tasks. This is
+ being studied from both theoretical and practical points of view, and
+ hopefully we should be able to produce some demonstrative code soon;
+ - (c)group based bandwidth management, and maybe scheduling;
+ - access control for non-root users (and related security concerns to
+ address), which is the best way to allow unprivileged use of the mechanisms
+ and how to prevent non-root users "cheat" the system?
+
+ As already discussed, we are planning also to merge this work with the EDF
+ throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
+ the preliminary phases of the merge and we really seek feedback that would
+ help us decide on the direction it should take.
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0de2482..0dd5e09 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -351,7 +351,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
* disrupting the schedulability of the system. Otherwise, we should
* refill the runtime and set the deadline a period in the future,
* because keeping the current (absolute) deadline of the task would
- * result in breaking guarantees promised to other tasks.
+ * result in breaking guarantees promised to other tasks (refer to
+ * Documentation/scheduler/sched-deadline.txt for more informations).
*
* This function returns true if:
*
--
1.7.9.5


2014-01-27 11:23:46

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On 01/27/2014 12:20 PM, Juri Lelli wrote:
> From: Dario Faggioli <[email protected]>
>
> Add in Documentation/scheduler/ some hints about the design
> choices, the usage and the future possible developments of the
> sched_dl scheduling class and of the SCHED_DEADLINE policy.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Dario Faggioli <[email protected]>
> Signed-off-by: Juri Lelli <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> [ Re-wrote sections 2 and 3. ]
> Signed-off-by: Luca Abeni <[email protected]>
> ---
> Documentation/scheduler/00-INDEX | 2 +
> Documentation/scheduler/sched-deadline.txt | 281 ++++++++++++++++++++++++++++
> kernel/sched/deadline.c | 3 +-
> 3 files changed, 285 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/scheduler/sched-deadline.txt
>
> diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX
> index d2651c4..46702e4 100644
> --- a/Documentation/scheduler/00-INDEX
> +++ b/Documentation/scheduler/00-INDEX
> @@ -10,5 +10,7 @@ sched-nice-design.txt
> - How and why the scheduler's nice levels are implemented.
> sched-rt-group.txt
> - real-time group scheduling.
> +sched-deadline.txt
> + - deadline scheduling.
> sched-stats.txt
> - information on schedstats (Linux Scheduler Statistics).
> diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
> new file mode 100644
> index 0000000..18adc92
> --- /dev/null
> +++ b/Documentation/scheduler/sched-deadline.txt
> @@ -0,0 +1,281 @@
> + Deadline Task Scheduling
> + ------------------------
> +
> +CONTENTS
> +========
> +
> + 0. WARNING
> + 1. Overview
> + 2. Scheduling algorithm
> + 3. Scheduling Real-Time Tasks

We also plan to add here something more about admission control in the next future.


Best,

- Juri

> + 4. Bandwidth management
> + 4.1 System-wide settings
> + 4.2 Task interface
> + 4.3 Default behavior
> + 5. Tasks CPU affinity
> + 5.1 SCHED_DEADLINE and cpusets HOWTO
> + 6. Future plans
> +
> +
> +0. WARNING
> +==========
> +
> + Fiddling with these settings can result in an unpredictable or even unstable
> + system behavior. As for -rt (group) scheduling, it is assumed that root users
> + know what they're doing.
> +
> +
> +1. Overview
> +===========
> +
> + The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
> + basically an implementation of the Earliest Deadline First (EDF) scheduling
> + algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
> + that makes it possible to isolate the behavior of tasks between each other.
> +
> +
> +2. Scheduling algorithm
> +==================
> +
> + SCHED_DEADLINE uses three parameters, named "runtime", "period", and
> + "deadline" to schedule tasks. A SCHED_DEADLINE task is guaranteed to receive
> + "runtime" microseconds of execution time every "period" microseconds, and
> + these "runtime" microseconds are available within "deadline" microseconds
> + from the beginning of the period. In order to implement this behaviour,
> + every time the task wakes up, the scheduler computes a "scheduling deadline"
> + consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
> + scheduled using EDF[1] on these scheduling deadlines (the task with the
> + smallest scheduling deadline is selected for execution). Notice that this
> + guaranteed is respected if a proper "admission control" strategy (see Section
> + "4. Bandwidth management") is used.
> +
> + Summing up, the CBS[2,3] algorithms assigns scheduling deadlines to tasks so
> + that each task runs for at most its runtime every period, avoiding any
> + interference between different tasks (bandwidth isolation), while the EDF[1]
> + algorithm selects the task with the smallest scheduling deadline as the one
> + to be executed first. Thanks to this feature, also tasks that do not
> + strictly comply with the "traditional" real-time task model (see Section 3)
> + can effectively use the new policy.
> +
> + In more details, the CBS algorithm assigns scheduling deadlines to
> + tasks in the following way:
> +
> + - Each SCHED_DEADLINE task is characterised by the "runtime",
> + "deadline", and "period" parameters;
> +
> + - The state of the task is described by a "scheduling deadline", and
> + a "current runtime". These two parameters are initially set to 0;
> +
> + - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
> + the scheduler checks if
> +
> + current runtime runtime
> + ---------------------------------- > ----------------
> + scheduling deadline - current time period
> +
> + then, if the scheduling deadline is smaller than the current time, or
> + this condition is verified, the scheduling deadline and the
> + current budget are re-initialised as
> +
> + scheduling deadline = current time + deadline
> + current runtime = runtime
> +
> + otherwise, the scheduling deadline and the current runtime are
> + left unchanged;
> +
> + - When a SCHED_DEADLINE task executes for an amount of time t, its
> + current runtime is decreased as
> +
> + current runtime = current runtime - t
> +
> + (technically, the runtime is decreased at every tick, or when the
> + task is descheduled / preempted);
> +
> + - When the current runtime becomes less or equal than 0, the task is
> + said to be "throttled" (also known as "depleted" in real-time literature)
> + and cannot be scheduled until its scheduling deadline. The "replenishment
> + time" for this task (see next item) is set to be equal to the current
> + value of the scheduling deadline;
> +
> + - When the current time is equal to the replenishment time of a
> + throttled task, the scheduling deadline and the current runtime are
> + updated as
> +
> + scheduling deadline = scheduling deadline + period
> + current runtime = current runtime + runtime
> +
> +
> +3. Scheduling Real-Time Tasks
> +=============================
> +
> + * BIG FAT WARNING ******************************************************
> + *
> + * This section contains a (not-thorough) summary on classical deadline
> + * scheduling theory, and how it applies to SCHED_DEADLINE.
> + * The reader can "safely" skip to Section 4 if only interested in seeing
> + * how the scheduling policy can be used. Anyway, we strongly recommend
> + * to come back here and continue reading (once the urge for testing is
> + * satisfied :P) to be sure of fully understanding all technical details.
> + ************************************************************************
> +
> + There are no limitations on what kind of task can exploit this new
> + scheduling discipline, even if it must be said that it is particularly
> + suited for periodic or sporadic real-time tasks that need guarantees on their
> + timing behavior, e.g., multimedia, streaming, control applications, etc.
> +
> + A typical real-time task is composed of a repetition of computation phases
> + (task instances, or jobs) which are activated on a periodic or sporadic
> + fashion.
> + Each job J_j (where J_j is the j^th job of the task) is characterised by an
> + arrival time r_j (the time when the job starts), an amount of computation
> + time c_j needed to finish the job, and a job absolute deadline d_j, which
> + is the time within which the job should be finished. The maximum execution
> + time max_j{c_j} is called "Worst Case Execution Time" (WCET) for the task.
> + A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
> + sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
> + d_j = r_j + D, where D is the task's relative deadline.
> +
> + SCHED_DEADLINE can be used to schedule real-time tasks guaranteeing that
> + the jobs' deadlines of a task are respected. In order to do this, a task
> + must be scheduled by setting:
> +
> + - runtime >= WCET
> + - deadline = D
> + - period <= P
> +
> + IOW, if runtime >= WCET and if period is >= P, then the scheduling deadlines
> + and the absolute deadlines (d_j) coincide, so a proper admission control
> + allows to respect the jobs' absolute deadlines for this task (this is what is
> + called "hard schedulability property" and is an extension of Lemma 1 of [2]).
> +
> + References:
> + 1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
> + ming in a hard-real-time environment. Journal of the Association for
> + Computing Machinery, 20(1), 1973.
> + 2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
> + Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
> + Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
> + 3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
> + Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
> +
> +4. Bandwidth management
> +=======================
> +
> + In order for the -deadline scheduling to be effective and useful, it is
> + important to have some method to keep the allocation of the available CPU
> + bandwidth to the tasks under control.
> + This is usually called "admission control" and if it is not performed at all,
> + no guarantee can be given on the actual scheduling of the -deadline tasks.
> +
> + Since when RT-throttling has been introduced each task group has a bandwidth
> + associated, calculated as a certain amount of runtime over a period.
> + Moreover, to make it possible to manipulate such bandwidth, readable/writable
> + controls have been added to both procfs (for system wide settings) and cgroupfs
> + (for per-group settings).
> + Therefore, the same interface is being used for controlling the bandwidth
> + distrubution to -deadline tasks.
> +
> + However, more discussion is needed in order to figure out how we want to manage
> + SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
> + uses (for now) a less sophisticated, but actually very sensible, mechanism to
> + ensure that a certain utilization cap is not overcome per each root_domain.
> +
> + Another main difference between deadline bandwidth management and RT-throttling
> + is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
> + and thus we don't need an higher level throttling mechanism to enforce the
> + desired bandwidth.
> +
> +4.1 System wide settings
> +------------------------
> +
> + The system wide settings are configured under the /proc virtual file system.
> +
> + For now the -rt knobs are used for dl admission control and the -deadline
> + runtime is accounted against the -rt runtime. We realise that this isn't
> + entirely desirable; however, it is better to have a small interface for now,
> + and be able to change it easily later. The ideal situation (see 5.) is to run
> + -rt tasks from a -deadline server; in which case the -rt bandwidth is a direct
> + subset of dl_bw.
> +
> + This means that, for a root_domain comprising M CPUs, -deadline tasks
> + can be created while the sum of their bandwidths stays below:
> +
> + M * (sched_rt_runtime_us / sched_rt_period_us)
> +
> + It is also possible to disable this bandwidth management logic, and
> + be thus free of oversubscribing the system up to any arbitrary level.
> + This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
> +
> +
> +4.2 Task interface
> +------------------
> +
> + Specifying a periodic/sporadic task that executes for a given amount of
> + runtime at each instance, and that is scheduled according to the urgency of
> + its own timing constraints needs, in general, a way of declaring:
> + - a (maximum/typical) instance execution time,
> + - a minimum interval between consecutive instances,
> + - a time constraint by which each instance must be completed.
> +
> + Therefore:
> + * a new struct sched_attr, containing all the necessary fields is
> + provided;
> + * the new scheduling related syscalls that manipulate it, i.e.,
> + sched_setattr() and sched_getattr() are implemented.
> +
> +
> +4.3 Default behavior
> +---------------------
> +
> + The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
> + 950000. With rt_period equal to 1000000, by default, it means that -deadline
> + tasks can use at most 95%, multiplied by the number of CPUs that compose the
> + root_domain, for each root_domain.
> +
> + A -deadline task cannot fork.
> +
> +5. Tasks CPU affinity
> +=====================
> +
> + -deadline tasks cannot have an affinity mask smaller that the entire
> + root_domain they are created on. However, affinities can be specified
> + through the cpuset facility (Documentation/cgroups/cpusets.txt).
> +
> +5.1 SCHED_DEADLINE and cpusets HOWTO
> +------------------------------------
> +
> + An example of a simple configuration (pin a -deadline task to CPU0)
> + follows (rt-app is used to create a -deadline task).
> +
> + mkdir /dev/cpuset
> + mount -t cgroup -o cpuset cpuset /dev/cpuset
> + cd /dev/cpuset
> + mkdir cpu0
> + echo 0 > cpu0/cpuset.cpus
> + echo 0 > cpu0/cpuset.mems
> + echo 1 > cpuset.cpu_exclusive
> + echo 0 > cpuset.sched_load_balance
> + echo 1 > cpu0/cpuset.cpu_exclusive
> + echo 1 > cpu0/cpuset.mem_exclusive
> + echo $$ > cpu0/tasks
> + rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
> + task affinity)
> +
> +6. Future plans
> +===============
> +
> + Still missing:
> +
> + - refinements to deadline inheritance, especially regarding the possibility
> + of retaining bandwidth isolation among non-interacting tasks. This is
> + being studied from both theoretical and practical points of view, and
> + hopefully we should be able to produce some demonstrative code soon;
> + - (c)group based bandwidth management, and maybe scheduling;
> + - access control for non-root users (and related security concerns to
> + address), which is the best way to allow unprivileged use of the mechanisms
> + and how to prevent non-root users "cheat" the system?
> +
> + As already discussed, we are planning also to merge this work with the EDF
> + throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
> + the preliminary phases of the merge and we really seek feedback that would
> + help us decide on the direction it should take.
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0de2482..0dd5e09 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -351,7 +351,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
> * disrupting the schedulability of the system. Otherwise, we should
> * refill the runtime and set the deadline a period in the future,
> * because keeping the current (absolute) deadline of the task would
> - * result in breaking guarantees promised to other tasks.
> + * result in breaking guarantees promised to other tasks (refer to
> + * Documentation/scheduler/sched-deadline.txt for more informations).
> *
> * This function returns true if:
> *
>

2014-01-27 11:55:20

by Henrik Austad

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On Mon, Jan 27, 2014 at 12:20:15PM +0100, Juri Lelli wrote:
> From: Dario Faggioli <[email protected]>
>
> Add in Documentation/scheduler/ some hints about the design
> choices, the usage and the future possible developments of the
> sched_dl scheduling class and of the SCHED_DEADLINE policy.
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Dario Faggioli <[email protected]>
> Signed-off-by: Juri Lelli <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> [ Re-wrote sections 2 and 3. ]
> Signed-off-by: Luca Abeni <[email protected]>
> ---
> Documentation/scheduler/00-INDEX | 2 +
> Documentation/scheduler/sched-deadline.txt | 281 ++++++++++++++++++++++++++++
> kernel/sched/deadline.c | 3 +-
> 3 files changed, 285 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/scheduler/sched-deadline.txt
>
> diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX
> index d2651c4..46702e4 100644
> --- a/Documentation/scheduler/00-INDEX
> +++ b/Documentation/scheduler/00-INDEX
> @@ -10,5 +10,7 @@ sched-nice-design.txt
> - How and why the scheduler's nice levels are implemented.
> sched-rt-group.txt
> - real-time group scheduling.
> +sched-deadline.txt
> + - deadline scheduling.
> sched-stats.txt
> - information on schedstats (Linux Scheduler Statistics).
> diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
> new file mode 100644
> index 0000000..18adc92
> --- /dev/null
> +++ b/Documentation/scheduler/sched-deadline.txt
> @@ -0,0 +1,281 @@
> + Deadline Task Scheduling
> + ------------------------
> +
> +CONTENTS
> +========
> +
> + 0. WARNING
> + 1. Overview
> + 2. Scheduling algorithm
> + 3. Scheduling Real-Time Tasks
> + 4. Bandwidth management
> + 4.1 System-wide settings
> + 4.2 Task interface
> + 4.3 Default behavior
> + 5. Tasks CPU affinity
> + 5.1 SCHED_DEADLINE and cpusets HOWTO
> + 6. Future plans
> +
> +
> +0. WARNING
> +==========
> +
> + Fiddling with these settings can result in an unpredictable or even unstable
> + system behavior. As for -rt (group) scheduling, it is assumed that root users
> + know what they're doing.
> +
> +
> +1. Overview
> +===========
> +
> + The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
> + basically an implementation of the Earliest Deadline First (EDF) scheduling
> + algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
> + that makes it possible to isolate the behavior of tasks between each other.
> +
> +
> +2. Scheduling algorithm
> +==================
> +
> + SCHED_DEADLINE uses three parameters, named "runtime", "period", and
> + "deadline" to schedule tasks. A SCHED_DEADLINE task is guaranteed to receive
> + "runtime" microseconds of execution time every "period" microseconds, and
> + these "runtime" microseconds are available within "deadline" microseconds
> + from the beginning of the period. In order to implement this behaviour,
> + every time the task wakes up, the scheduler computes a "scheduling deadline"
> + consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
> + scheduled using EDF[1] on these scheduling deadlines (the task with the
> + smallest scheduling deadline is selected for execution). Notice that this
> + guaranteed is respected if a proper "admission control" strategy (see Section
> + "4. Bandwidth management") is used.
> +
> + Summing up, the CBS[2,3] algorithms assigns scheduling deadlines to tasks so
> + that each task runs for at most its runtime every period, avoiding any
> + interference between different tasks (bandwidth isolation), while the EDF[1]
> + algorithm selects the task with the smallest scheduling deadline as the one
> + to be executed first. Thanks to this feature, also tasks that do not
> + strictly comply with the "traditional" real-time task model (see Section 3)
> + can effectively use the new policy.
> +
> + In more details, the CBS algorithm assigns scheduling deadlines to
> + tasks in the following way:
> +
> + - Each SCHED_DEADLINE task is characterised by the "runtime",
> + "deadline", and "period" parameters;
> +
> + - The state of the task is described by a "scheduling deadline", and
> + a "current runtime". These two parameters are initially set to 0;
> +
> + - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
> + the scheduler checks if
> +
> + current runtime runtime
> + ---------------------------------- > ----------------
> + scheduling deadline - current time period
> +
> + then, if the scheduling deadline is smaller than the current time, or
> + this condition is verified, the scheduling deadline and the
> + current budget are re-initialised as

Current runtime: time spent running _this_ period? or is _remaining_
runtime this period? I get the feeling it's the latter.

So, roughly, it is the ration

remaining_runtime / relative_time_to_deadline

which needs to be greater than the assigned CPU bandwidth, and if so, the
budget should be replensihed?

Shouldn't there be something about not refilling the budget before a new
period has started?

> + scheduling deadline = current time + deadline
> + current runtime = runtime
> +
> + otherwise, the scheduling deadline and the current runtime are
> + left unchanged;
> +
> + - When a SCHED_DEADLINE task executes for an amount of time t, its
> + current runtime is decreased as
> +
> + current runtime = current runtime - t
> +
> + (technically, the runtime is decreased at every tick, or when the
> + task is descheduled / preempted);

Aha, there it is. Having it here makes sense, but it does wrapping ones
head around this a bit harder than strictly necessary perhaps.

> + - When the current runtime becomes less or equal than 0, the task is
> + said to be "throttled" (also known as "depleted" in real-time literature)
> + and cannot be scheduled until its scheduling deadline. The "replenishment
> + time" for this task (see next item) is set to be equal to the current
> + value of the scheduling deadline;
> +
> + - When the current time is equal to the replenishment time of a
> + throttled task, the scheduling deadline and the current runtime are
> + updated as
> +
> + scheduling deadline = scheduling deadline + period
> + current runtime = current runtime + runtime

ok, this section makes sense now

> +3. Scheduling Real-Time Tasks
> +=============================
> +
> + * BIG FAT WARNING ******************************************************
> + *
> + * This section contains a (not-thorough) summary on classical deadline
> + * scheduling theory, and how it applies to SCHED_DEADLINE.

"This section should not be considered a complete summary of classical
deadline scheduling theroy in any way AT ALL."

(not-thorough sounds a bit strange)

> + * The reader can "safely" skip to Section 4 if only interested in seeing
> + * how the scheduling policy can be used. Anyway, we strongly recommend
> + * to come back here and continue reading (once the urge for testing is
> + * satisfied :P) to be sure of fully understanding all technical details.
> + ************************************************************************
> +
> + There are no limitations on what kind of task can exploit this new
> + scheduling discipline, even if it must be said that it is particularly
> + suited for periodic or sporadic real-time tasks that need guarantees on their
> + timing behavior, e.g., multimedia, streaming, control applications, etc.
> +
> + A typical real-time task is composed of a repetition of computation phases
> + (task instances, or jobs) which are activated on a periodic or sporadic
> + fashion.
> + Each job J_j (where J_j is the j^th job of the task) is characterised by an
> + arrival time r_j (the time when the job starts), an amount of computation
> + time c_j needed to finish the job, and a job absolute deadline d_j, which
> + is the time within which the job should be finished. The maximum execution
> + time max_j{c_j} is called "Worst Case Execution Time" (WCET) for the task.
> + A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
> + sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
> + d_j = r_j + D, where D is the task's relative deadline.

\o/

Great, thanks!

> + SCHED_DEADLINE can be used to schedule real-time tasks guaranteeing that
> + the jobs' deadlines of a task are respected. In order to do this, a task
> + must be scheduled by setting:
> +
> + - runtime >= WCET
> + - deadline = D
> + - period <= P
> +
> + IOW, if runtime >= WCET and if period is >= P, then the scheduling deadlines
> + and the absolute deadlines (d_j) coincide, so a proper admission control
> + allows to respect the jobs' absolute deadlines for this task (this is what is
> + called "hard schedulability property" and is an extension of Lemma 1 of [2]).
> +
> + References:
> + 1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
> + ming in a hard-real-time environment. Journal of the Association for
> + Computing Machinery, 20(1), 1973.
> + 2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
> + Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
> + Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
> + 3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
> + Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
> +
> +4. Bandwidth management
> +=======================
> +
> + In order for the -deadline scheduling to be effective and useful, it is
> + important to have some method to keep the allocation of the available CPU
> + bandwidth to the tasks under control.
> + This is usually called "admission control" and if it is not performed at all,
> + no guarantee can be given on the actual scheduling of the -deadline tasks.
> +
> + Since when RT-throttling has been introduced each task group has a bandwidth
> + associated, calculated as a certain amount of runtime over a period.
> + Moreover, to make it possible to manipulate such bandwidth, readable/writable
> + controls have been added to both procfs (for system wide settings) and cgroupfs
> + (for per-group settings).
> + Therefore, the same interface is being used for controlling the bandwidth
> + distrubution to -deadline tasks.
> +
> + However, more discussion is needed in order to figure out how we want to manage
> + SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
> + uses (for now) a less sophisticated, but actually very sensible, mechanism to
> + ensure that a certain utilization cap is not overcome per each root_domain.
> +
> + Another main difference between deadline bandwidth management and RT-throttling
> + is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
> + and thus we don't need an higher level throttling mechanism to enforce the
> + desired bandwidth.
> +
> +4.1 System wide settings
> +------------------------
> +
> + The system wide settings are configured under the /proc virtual file system.
> +
> + For now the -rt knobs are used for dl admission control and the -deadline
> + runtime is accounted against the -rt runtime. We realise that this isn't
> + entirely desirable; however, it is better to have a small interface for now,
> + and be able to change it easily later. The ideal situation (see 5.) is to run
> + -rt tasks from a -deadline server; in which case the -rt bandwidth is a direct
> + subset of dl_bw.
> +
> + This means that, for a root_domain comprising M CPUs, -deadline tasks
> + can be created while the sum of their bandwidths stays below:
> +
> + M * (sched_rt_runtime_us / sched_rt_period_us)
> +
> + It is also possible to disable this bandwidth management logic, and
> + be thus free of oversubscribing the system up to any arbitrary level.
> + This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
> +
> +
> +4.2 Task interface
> +------------------
> +
> + Specifying a periodic/sporadic task that executes for a given amount of
> + runtime at each instance, and that is scheduled according to the urgency of
> + its own timing constraints needs, in general, a way of declaring:
> + - a (maximum/typical) instance execution time,
> + - a minimum interval between consecutive instances,
> + - a time constraint by which each instance must be completed.
> +
> + Therefore:
> + * a new struct sched_attr, containing all the necessary fields is
> + provided;
> + * the new scheduling related syscalls that manipulate it, i.e.,
> + sched_setattr() and sched_getattr() are implemented.
> +
> +
> +4.3 Default behavior
> +---------------------
> +
> + The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
> + 950000. With rt_period equal to 1000000, by default, it means that -deadline
> + tasks can use at most 95%, multiplied by the number of CPUs that compose the
> + root_domain, for each root_domain.
> +
> + A -deadline task cannot fork.
> +
> +5. Tasks CPU affinity
> +=====================
> +
> + -deadline tasks cannot have an affinity mask smaller that the entire
> + root_domain they are created on. However, affinities can be specified
> + through the cpuset facility (Documentation/cgroups/cpusets.txt).
> +
> +5.1 SCHED_DEADLINE and cpusets HOWTO
> +------------------------------------
> +
> + An example of a simple configuration (pin a -deadline task to CPU0)
> + follows (rt-app is used to create a -deadline task).
> +
> + mkdir /dev/cpuset
> + mount -t cgroup -o cpuset cpuset /dev/cpuset
> + cd /dev/cpuset
> + mkdir cpu0
> + echo 0 > cpu0/cpuset.cpus
> + echo 0 > cpu0/cpuset.mems
> + echo 1 > cpuset.cpu_exclusive
> + echo 0 > cpuset.sched_load_balance
> + echo 1 > cpu0/cpuset.cpu_exclusive
> + echo 1 > cpu0/cpuset.mem_exclusive
> + echo $$ > cpu0/tasks
> + rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
> + task affinity)
> +
> +6. Future plans
> +===============
> +
> + Still missing:
> +
> + - refinements to deadline inheritance, especially regarding the possibility
> + of retaining bandwidth isolation among non-interacting tasks. This is
> + being studied from both theoretical and practical points of view, and
> + hopefully we should be able to produce some demonstrative code soon;
> + - (c)group based bandwidth management, and maybe scheduling;
> + - access control for non-root users (and related security concerns to
> + address), which is the best way to allow unprivileged use of the mechanisms
> + and how to prevent non-root users "cheat" the system?
> +
> + As already discussed, we are planning also to merge this work with the EDF
> + throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
> + the preliminary phases of the merge and we really seek feedback that would
> + help us decide on the direction it should take.
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 0de2482..0dd5e09 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -351,7 +351,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
> * disrupting the schedulability of the system. Otherwise, we should
> * refill the runtime and set the deadline a period in the future,
> * because keeping the current (absolute) deadline of the task would
> - * result in breaking guarantees promised to other tasks.
> + * result in breaking guarantees promised to other tasks (refer to
> + * Documentation/scheduler/sched-deadline.txt for more informations).
> *
> * This function returns true if:
> *
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-doc" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Nice! /me is very happy

Reviewed-by: Henrik Austad <[email protected]>

--
Henrik Austad


Attachments:
(No filename) (16.42 kB)
signature.asc (198.00 B)
Digital signature
Download all attachments

2014-01-27 12:30:51

by Luca Abeni

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

Hi Henrik,

On 01/27/2014 12:53 PM, Henrik Austad wrote:
[...]
>> + In more details, the CBS algorithm assigns scheduling deadlines to
>> + tasks in the following way:
>> +
>> + - Each SCHED_DEADLINE task is characterised by the "runtime",
>> + "deadline", and "period" parameters;
>> +
>> + - The state of the task is described by a "scheduling deadline", and
>> + a "current runtime". These two parameters are initially set to 0;
>> +
>> + - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
>> + the scheduler checks if
>> +
>> + current runtime runtime
>> + ---------------------------------- > ----------------
>> + scheduling deadline - current time period
>> +
>> + then, if the scheduling deadline is smaller than the current time, or
>> + this condition is verified, the scheduling deadline and the
>> + current budget are re-initialised as
>
> Current runtime: time spent running _this_ period? or is _remaining_
> runtime this period? I get the feeling it's the latter.
>
> So, roughly, it is the ration
>
> remaining_runtime / relative_time_to_deadline
>
> which needs to be greater than the assigned CPU bandwidth, and if so, the
> budget should be replensihed?
>
> Shouldn't there be something about not refilling the budget before a new
> period has started?
Uh... Maybe the description above can be improved :)
Do you think that using "remaining runtime" instead of "current runtime"
would help? If yes, I'll make this change.

Also, I see that some of your questions are answered by some items below...
Do you think that changing the order of the items in the presentation would
improve the readability? If you suggest a better ordering, I'll try to
rewrite the algorithm's description according to it.


Thanks,
Luca

2014-01-27 12:42:14

by Henrik Austad

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On Mon, Jan 27, 2014 at 01:30:42PM +0100, Luca Abeni wrote:
> Hi Henrik,
>
> On 01/27/2014 12:53 PM, Henrik Austad wrote:
> [...]
> >>+ In more details, the CBS algorithm assigns scheduling deadlines to
> >>+ tasks in the following way:
> >>+
> >>+ - Each SCHED_DEADLINE task is characterised by the "runtime",
> >>+ "deadline", and "period" parameters;

> >>+
> >>+ - The state of the task is described by a "scheduling deadline", and
> >>+ a "current runtime". These two parameters are initially set to 0;
> >>+
> >>+ - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
> >>+ the scheduler checks if
> >>+
> >>+ current runtime runtime
> >>+ ---------------------------------- > ----------------
> >>+ scheduling deadline - current time period
> >>+
> >>+ then, if the scheduling deadline is smaller than the current time, or
> >>+ this condition is verified, the scheduling deadline and the
> >>+ current budget are re-initialised as
> >
> >Current runtime: time spent running _this_ period? or is _remaining_
> >runtime this period? I get the feeling it's the latter.
> >
> >So, roughly, it is the ration
> >
> > remaining_runtime / relative_time_to_deadline
> >
> >which needs to be greater than the assigned CPU bandwidth, and if so, the
> >budget should be replensihed?
> >
> >Shouldn't there be something about not refilling the budget before a new
> >period has started?
> Uh... Maybe the description above can be improved :)
> Do you think that using "remaining runtime" instead of "current runtime"
> would help? If yes, I'll make this change.

Yes, in my vocabularly, "remaining" != "current" :), so changing to
'remaining runtime' would be nice.

> Also, I see that some of your questions are answered by some items below...

Yes, but I left the comments to indicate that the order was a bit
confusing.

> Do you think that changing the order of the items in the presentation would
> improve the readability? If you suggest a better ordering, I'll try to
> rewrite the algorithm's description according to it.

Could be, at least the ratio-caclulation was a bit confusing until I'd read
the entire section.

My preferred order (I've just cut'n'pased from the original email here)

+ - The state of the task is described by a "scheduling deadline", and
+ a "current runtime". These two parameters are initially set to 0;
+
+ - Each SCHED_DEADLINE task is characterised by the "runtime",
+ "deadline", and "period" parameters;
+



+ - When a SCHED_DEADLINE task executes for an amount of time t, its
+ current runtime is decreased as
+
+ current runtime = current runtime - t
+
+ (technically, the runtime is decreased at every tick, or when the
+ task is descheduled / preempted);
+
+ - When the current runtime becomes less or equal than 0, the task is
+ said to be "throttled" (also known as "depleted" in real-time literature)
+ and cannot be scheduled until its scheduling deadline. The "replenishment
+ time" for this task (see next item) is set to be equal to the current
+ value of the scheduling deadline;
+
+ - When the current time is equal to the replenishment time of a
+ throttled task, the scheduling deadline and the current runtime are
+ updated as
+
+ scheduling deadline = scheduling deadline + period
+ current runtime = current runtime + runtime




+ - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
+ the scheduler checks if
+
+ current runtime runtime
+ ---------------------------------- > ----------------
+ scheduling deadline - current time period
+
+ then, if the scheduling deadline is smaller than the current time, or
+ this condition is verified, the scheduling deadline and the
+ current budget are re-initialised as
+
+ scheduling deadline = current time + deadline
+ current runtime = runtime
+
+ otherwise, the scheduling deadline and the current runtime are
+ left unchanged;
+

Emphasis on -my- preferred order. :)

Either way, I'm quite happy with this documentation-update, this is just
nitpick :)

--
Henrik Austad


Attachments:
(No filename) (4.15 kB)
signature.asc (198.00 B)
Digital signature
Download all attachments

2014-01-27 12:52:37

by Luca Abeni

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On 01/27/2014 01:40 PM, Henrik Austad wrote:
[...]
>>> Current runtime: time spent running _this_ period? or is _remaining_
>>> runtime this period? I get the feeling it's the latter.
>>>
>>> So, roughly, it is the ration
>>>
>>> remaining_runtime / relative_time_to_deadline
>>>
>>> which needs to be greater than the assigned CPU bandwidth, and if so, the
>>> budget should be replensihed?
>>>
>>> Shouldn't there be something about not refilling the budget before a new
>>> period has started?
>> Uh... Maybe the description above can be improved :)
>> Do you think that using "remaining runtime" instead of "current runtime"
>> would help? If yes, I'll make this change.
>
> Yes, in my vocabularly, "remaining" != "current" :), so changing to
> 'remaining runtime' would be nice.
Ok; I'll make this change.


>> Also, I see that some of your questions are answered by some items below...
>
> Yes, but I left the comments to indicate that the order was a bit
> confusing.
>
>> Do you think that changing the order of the items in the presentation would
>> improve the readability? If you suggest a better ordering, I'll try to
>> rewrite the algorithm's description according to it.
>
> Could be, at least the ratio-caclulation was a bit confusing until I'd read
> the entire section.
>
> My preferred order (I've just cut'n'pased from the original email here)
>
> + - The state of the task is described by a "scheduling deadline", and
> + a "current runtime". These two parameters are initially set to 0;
BTW, maybe some of the confusion can be avoided by explaining what the
"remaining runtime" is here... Something like:
- The state of the task is described by a "scheduling deadline", and
a "current runtime" (representing the amount of execution time that
the task can use before the scheduling deadline). These two parameters
are initially set to 0;

Can something like this help? If yes, I'll update the document.


[...]
> + - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
> + the scheduler checks if
> +
> + current runtime runtime
> + ---------------------------------- > ----------------
> + scheduling deadline - current time period
> +
> + then, if the scheduling deadline is smaller than the current time, or
> + this condition is verified, the scheduling deadline and the
> + current budget are re-initialised as
> +
> + scheduling deadline = current time + deadline
> + current runtime = runtime
> +
> + otherwise, the scheduling deadline and the current runtime are
> + left unchanged;
> +
>
> Emphasis on -my- preferred order. :)
Ok; let's see what other people think, and then we decide the items'
order.


> Either way, I'm quite happy with this documentation-update, this is just
> nitpick :)
Ok, good.


Thanks,
Luca

2014-01-27 15:36:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On Mon, 27 Jan 2014 12:20:15 +0100
Juri Lelli <[email protected]> wrote:


> +
> +2. Scheduling algorithm
> +==================
> +
> + SCHED_DEADLINE uses three parameters, named "runtime", "period", and
> + "deadline" to schedule tasks. A SCHED_DEADLINE task is guaranteed to receive
> + "runtime" microseconds of execution time every "period" microseconds, and
> + these "runtime" microseconds are available within "deadline" microseconds
> + from the beginning of the period. In order to implement this behaviour,
> + every time the task wakes up, the scheduler computes a "scheduling deadline"
> + consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
> + scheduled using EDF[1] on these scheduling deadlines (the task with the
> + smallest scheduling deadline is selected for execution). Notice that this

s/smallest/closest/

You can have a large deadline, but it could be the next one to trigger.

> + guaranteed is respected if a proper "admission control" strategy (see Section
> + "4. Bandwidth management") is used.
> +
> + Summing up, the CBS[2,3] algorithms assigns scheduling deadlines to tasks so
> + that each task runs for at most its runtime every period, avoiding any
> + interference between different tasks (bandwidth isolation), while the EDF[1]
> + algorithm selects the task with the smallest scheduling deadline as the one

Probably should say closest here too. Smallest doesn't mean that it is
the next deadline.

> + to be executed first. Thanks to this feature, also tasks that do not
> + strictly comply with the "traditional" real-time task model (see Section 3)
> + can effectively use the new policy.
> +
> + In more details, the CBS algorithm assigns scheduling deadlines to
> + tasks in the following way:
> +
> + - Each SCHED_DEADLINE task is characterised by the "runtime",
> + "deadline", and "period" parameters;
> +
> + - The state of the task is described by a "scheduling deadline", and
> + a "current runtime". These two parameters are initially set to 0;
> +
> + - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
> + the scheduler checks if
> +
> + current runtime runtime
> + ---------------------------------- > ----------------
> + scheduling deadline - current time period

Just a nit on formatting above. I was confused at first because the
above looks more like a movement in time (the > being an arrow), then
after reading the below, I realized that the above is actually an
equation. This can be fixed by adding more spaces between the fractions
and the greater than sign:

current runtime runtime
---------------------------------- > ----------------
scheduling deadline - current time period


See, it no longer looks like a timeline:

---------------------------------- > ----------------


> +
> + then, if the scheduling deadline is smaller than the current time, or
> + this condition is verified, the scheduling deadline and the
> + current budget are re-initialised as
> +
> + scheduling deadline = current time + deadline
> + current runtime = runtime
> +
> + otherwise, the scheduling deadline and the current runtime are
> + left unchanged;

I've been trying to wrap my head around this a bit. I started writing
an email about this when I first examined the patches and never sent it
out :-p

Lets take a case where deadline == period. It seems that the above
would be true any time there was any delay to starting the task or the
task was interrupted by another SCHED_DEADLINE task.

For example, lets say we have two SD tasks. One that has 50ms runtime
and a 100ms period. The other has a 1ms runtime and a 10ms period.

The above two should work perfectly fine together. The 10ms period task
will constantly schedule in on the 100ms task.

When the 100ms task runs, it could easily be delayed by 1ms due to the
10ms task. Then lets look at the above equation


50/99 > 50/100

That statement is true. Then I guess you make scheduling deadline =
current time + deadline, and leave the period alone? Then we run until
the next 10ms period. Get scheduled out by the 10ms task, and then
rescheduled back in. Looking at that we should have something like this:


(50 - 10) / (100 - 10 - 1) = 40/89 which is less than 50/100

I haven't analyzed this down to see how this actually works, I have to
read the CBS papers. But is this what is expected?


> +
> + - When a SCHED_DEADLINE task executes for an amount of time t, its
> + current runtime is decreased as
> +
> + current runtime = current runtime - t
> +
> + (technically, the runtime is decreased at every tick, or when the
> + task is descheduled / preempted);
> +
> + - When the current runtime becomes less or equal than 0, the task is
> + said to be "throttled" (also known as "depleted" in real-time literature)
> + and cannot be scheduled until its scheduling deadline. The "replenishment
> + time" for this task (see next item) is set to be equal to the current
> + value of the scheduling deadline;
> +
> + - When the current time is equal to the replenishment time of a
> + throttled task, the scheduling deadline and the current runtime are
> + updated as
> +
> + scheduling deadline = scheduling deadline + period
> + current runtime = current runtime + runtime
> +
> +
> +3. Scheduling Real-Time Tasks
> +=============================
> +
> + * BIG FAT WARNING ******************************************************
> + *
> + * This section contains a (not-thorough) summary on classical deadline
> + * scheduling theory, and how it applies to SCHED_DEADLINE.
> + * The reader can "safely" skip to Section 4 if only interested in seeing
> + * how the scheduling policy can be used. Anyway, we strongly recommend
> + * to come back here and continue reading (once the urge for testing is
> + * satisfied :P) to be sure of fully understanding all technical details.
> + ************************************************************************
> +
> + There are no limitations on what kind of task can exploit this new
> + scheduling discipline, even if it must be said that it is particularly
> + suited for periodic or sporadic real-time tasks that need guarantees on their
> + timing behavior, e.g., multimedia, streaming, control applications, etc.
> +
> + A typical real-time task is composed of a repetition of computation phases
> + (task instances, or jobs) which are activated on a periodic or sporadic
> + fashion.
> + Each job J_j (where J_j is the j^th job of the task) is characterised by an
> + arrival time r_j (the time when the job starts), an amount of computation
> + time c_j needed to finish the job, and a job absolute deadline d_j, which
> + is the time within which the job should be finished. The maximum execution
> + time max_j{c_j} is called "Worst Case Execution Time" (WCET) for the task.
> + A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
> + sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
> + d_j = r_j + D, where D is the task's relative deadline.
> +
> + SCHED_DEADLINE can be used to schedule real-time tasks guaranteeing that
> + the jobs' deadlines of a task are respected. In order to do this, a task
> + must be scheduled by setting:
> +
> + - runtime >= WCET
> + - deadline = D
> + - period <= P
> +
> + IOW, if runtime >= WCET and if period is >= P, then the scheduling deadlines
> + and the absolute deadlines (d_j) coincide, so a proper admission control
> + allows to respect the jobs' absolute deadlines for this task (this is what is
> + called "hard schedulability property" and is an extension of Lemma 1 of [2]).

I wonder if we should state the obvious (which is never obvious). That
is the user must also have the following.

runtime < deadline <= period

Although it is fine for deadline = period, runtime should be less than
deadline, otherwise the task will take over the system.

> +
> + References:
> + 1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
> + ming in a hard-real-time environment. Journal of the Association for
> + Computing Machinery, 20(1), 1973.
> + 2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
> + Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
> + Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
> + 3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
> + Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
> +
> +4. Bandwidth management
> +=======================
> +
> + In order for the -deadline scheduling to be effective and useful, it is
> + important to have some method to keep the allocation of the available CPU
> + bandwidth to the tasks under control.
> + This is usually called "admission control" and if it is not performed at all,
> + no guarantee can be given on the actual scheduling of the -deadline tasks.
> +
> + Since when RT-throttling has been introduced each task group has a bandwidth
> + associated, calculated as a certain amount of runtime over a period.

The above sentence doesn't make any sense. Most notably the "Since
when". As that is something my teenage daughter usually shots at me
when I tell her she needs to do one of her chores.


> + Moreover, to make it possible to manipulate such bandwidth, readable/writable
> + controls have been added to both procfs (for system wide settings) and cgroupfs
> + (for per-group settings).
> + Therefore, the same interface is being used for controlling the bandwidth
> + distrubution to -deadline tasks.

The last sentence above is also confusing. We added controls to proc
and cgroupfs, but then state the same interface is being used. Do you
mean that the proc and cgroupfs have the same type of interface for
each? Perhaps say something like "the same interface files are being
used in both locations".

> +
> + However, more discussion is needed in order to figure out how we want to manage
> + SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
> + uses (for now) a less sophisticated, but actually very sensible, mechanism to
> + ensure that a certain utilization cap is not overcome per each root_domain.

What mechanism is this?

> +
> + Another main difference between deadline bandwidth management and RT-throttling
> + is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
> + and thus we don't need an higher level throttling mechanism to enforce the
> + desired bandwidth.
> +
> +4.1 System wide settings
> +------------------------


The rest looks good.

-- Steve

2014-01-27 16:56:27

by Luca Abeni

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

Hi Steven,

On Mon, 27 Jan 2014 10:35:56 -0500
Steven Rostedt <[email protected]> wrote:
[...]
> > + to be executed first. Thanks to this feature, also tasks that do
> > not
> > + strictly comply with the "traditional" real-time task model (see
> > Section 3)
> > + can effectively use the new policy.
> > +
> > + In more details, the CBS algorithm assigns scheduling deadlines to
> > + tasks in the following way:
> > +
> > + - Each SCHED_DEADLINE task is characterised by the "runtime",
> > + "deadline", and "period" parameters;
> > +
> > + - The state of the task is described by a "scheduling deadline",
> > and
> > + a "current runtime". These two parameters are initially set to
> > 0; +
> > + - When a SCHED_DEADLINE task wakes up (becomes ready for
> > execution),
> > + the scheduler checks if
> > +
> > + current runtime runtime
> > + ---------------------------------- > ----------------
> > + scheduling deadline - current time period
>
> Just a nit on formatting above. I was confused at first because the
> above looks more like a movement in time (the > being an arrow), then
> after reading the below, I realized that the above is actually an
> equation. This can be fixed by adding more spaces between the
> fractions and the greater than sign:
>
> current runtime runtime
> ---------------------------------- > ----------------
> scheduling deadline - current time period
Ok; I was not sure about how to add an equation in the document, and I
did the first thing I could think about. I'll fix it.

> > +
> > + then, if the scheduling deadline is smaller than the current
> > time, or
> > + this condition is verified, the scheduling deadline and the
> > + current budget are re-initialised as
> > +
> > + scheduling deadline = current time + deadline
> > + current runtime = runtime
> > +
> > + otherwise, the scheduling deadline and the current runtime are
> > + left unchanged;
>
> I've been trying to wrap my head around this a bit. I started writing
> an email about this when I first examined the patches and never sent
> it out :-p
>
> Lets take a case where deadline == period. It seems that the above
> would be true any time there was any delay to starting the task or the
> task was interrupted by another SCHED_DEADLINE task.
Not sure about this... Notice that above only applies when a task wakes
up (moving from a "sleeping" state to a "ready to run" state). Not when
an already ready task is scheduled.
Or did I misunderstand your comment?


> For example, lets say we have two SD tasks. One that has 50ms runtime
> and a 100ms period. The other has a 1ms runtime and a 10ms period.
>
> The above two should work perfectly fine together. The 10ms period
> task will constantly schedule in on the 100ms task.
>
> When the 100ms task runs, it could easily be delayed by 1ms due to the
> 10ms task. Then lets look at the above equation
See above: the check for the 100ms task is only performed when the task
unblocks (at the beginning of its period), not when it's scheduled
(after the 10ms taks).

This check is designed to take care of the following situation:
- consider a task with runtime 10ms and period equal to deadline equal
to 100ms
- assume the task wakes up at time t, and is assigned "remaining
runtime" 10ms and "scheduling deadline" t+100ms
- assume the task blocks after executing for 2ms. The "remaining
runtime" is now 8ms
- can the task use "remaining runtime" 8ms and "scheduling deadline"
t+100ms when it wakes up again?
Answer: if it wakes up before t+20ms, it can. Otherwise, it cannot,
because it would consume a fraction of CPU time larger than 10%,
causing missed deadlines in other tasks.

[...]
> > + SCHED_DEADLINE can be used to schedule real-time tasks
> > guaranteeing that
> > + the jobs' deadlines of a task are respected. In order to do this,
> > a task
> > + must be scheduled by setting:
> > +
> > + - runtime >= WCET
> > + - deadline = D
> > + - period <= P
> > +
> > + IOW, if runtime >= WCET and if period is >= P, then the
> > scheduling deadlines
> > + and the absolute deadlines (d_j) coincide, so a proper admission
> > control
> > + allows to respect the jobs' absolute deadlines for this task
> > (this is what is
> > + called "hard schedulability property" and is an extension of
> > Lemma 1 of [2]).
>
> I wonder if we should state the obvious (which is never obvious). That
> is the user must also have the following.
>
> runtime < deadline <= period
>
> Although it is fine for deadline = period, runtime should be less than
> deadline, otherwise the task will take over the system.
I think if "runtime < deadline <= period" is not respected, then the
admission control will fail... But yes, repeating it here can be
useful. If needed I'll add it to the document.


Thanks,
Luca

2014-01-27 17:09:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On Mon, 27 Jan 2014 17:56:07 +0100
Luca Abeni <[email protected]> wrote:

> > > +
> > > + then, if the scheduling deadline is smaller than the current
> > > time, or
> > > + this condition is verified, the scheduling deadline and the
> > > + current budget are re-initialised as
> > > +
> > > + scheduling deadline = current time + deadline
> > > + current runtime = runtime
> > > +
> > > + otherwise, the scheduling deadline and the current runtime are
> > > + left unchanged;
> >
> > I've been trying to wrap my head around this a bit. I started writing
> > an email about this when I first examined the patches and never sent
> > it out :-p
> >
> > Lets take a case where deadline == period. It seems that the above
> > would be true any time there was any delay to starting the task or the
> > task was interrupted by another SCHED_DEADLINE task.
> Not sure about this... Notice that above only applies when a task wakes
> up (moving from a "sleeping" state to a "ready to run" state). Not when
> an already ready task is scheduled.
> Or did I misunderstand your comment?

No no, you understood, I missed that this only happens on wakeup. But
then I have to ask, what happens if the task blocks on a mutex? Would
that cause this check to happen then? It may be nice to add some
comments about exactly when this check is performed.

>
>
> > For example, lets say we have two SD tasks. One that has 50ms runtime
> > and a 100ms period. The other has a 1ms runtime and a 10ms period.
> >
> > The above two should work perfectly fine together. The 10ms period
> > task will constantly schedule in on the 100ms task.
> >
> > When the 100ms task runs, it could easily be delayed by 1ms due to the
> > 10ms task. Then lets look at the above equation
> See above: the check for the 100ms task is only performed when the task
> unblocks (at the beginning of its period), not when it's scheduled
> (after the 10ms taks).
>
> This check is designed to take care of the following situation:
> - consider a task with runtime 10ms and period equal to deadline equal
> to 100ms
> - assume the task wakes up at time t, and is assigned "remaining
> runtime" 10ms and "scheduling deadline" t+100ms
> - assume the task blocks after executing for 2ms. The "remaining
> runtime" is now 8ms
> - can the task use "remaining runtime" 8ms and "scheduling deadline"
> t+100ms when it wakes up again?
> Answer: if it wakes up before t+20ms, it can. Otherwise, it cannot,
> because it would consume a fraction of CPU time larger than 10%,
> causing missed deadlines in other tasks.

Ah, you answered my question here. The check happens every time the
task blocks as well. I still need to read the papers, and even more
importantly, start running more tests with tracing on to review what
exactly happens in the implementation.

>
> [...]
> > > + SCHED_DEADLINE can be used to schedule real-time tasks
> > > guaranteeing that
> > > + the jobs' deadlines of a task are respected. In order to do this,
> > > a task
> > > + must be scheduled by setting:
> > > +
> > > + - runtime >= WCET
> > > + - deadline = D
> > > + - period <= P
> > > +
> > > + IOW, if runtime >= WCET and if period is >= P, then the
> > > scheduling deadlines
> > > + and the absolute deadlines (d_j) coincide, so a proper admission
> > > control
> > > + allows to respect the jobs' absolute deadlines for this task
> > > (this is what is
> > > + called "hard schedulability property" and is an extension of
> > > Lemma 1 of [2]).
> >
> > I wonder if we should state the obvious (which is never obvious). That
> > is the user must also have the following.
> >
> > runtime < deadline <= period
> >
> > Although it is fine for deadline = period, runtime should be less than
> > deadline, otherwise the task will take over the system.
> I think if "runtime < deadline <= period" is not respected, then the
> admission control will fail... But yes, repeating it here can be
> useful. If needed I'll add it to the document.

Yeah, it's one of those things that you should know, but I can see
users screwing it up ;-)

-- Steve

2014-01-27 22:29:41

by Luca Abeni

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

Hi Steven,

On Mon, 27 Jan 2014 12:09:38 -0500
Steven Rostedt <[email protected]> wrote:
[...]
> > > Lets take a case where deadline == period. It seems that the above
> > > would be true any time there was any delay to starting the task
> > > or the task was interrupted by another SCHED_DEADLINE task.
> > Not sure about this... Notice that above only applies when a task
> > wakes up (moving from a "sleeping" state to a "ready to run"
> > state). Not when an already ready task is scheduled.
> > Or did I misunderstand your comment?
>
> No no, you understood, I missed that this only happens on wakeup. But
> then I have to ask, what happens if the task blocks on a mutex? Would
> that cause this check to happen then?
Well, in theory, this check has to be performed every time the task
wakes up from any kind of sleeping (mutex, or anything else).
I think this is what happens in practice with the current
implementation, but maybe I am missing something in the implementation
details.

> It may be nice to add some comments about exactly when this check is
> performed.
Well, maybe "wake-up" is not the right term according to the
terminology used in the Linux kernel... The check should be performed
every time the task changes its state from sleeping (TASK_INTERRUPTIBLE
or TASK_UNINTERRUPTIBLE, I think) to "ready to execute" (TASK_RUNNING,
I think) and enters the ready queue (well, RB tree :). If someone can
suggest a more appropriate wording, I'll fix the document accordingly.


> > > For example, lets say we have two SD tasks. One that has 50ms
> > > runtime and a 100ms period. The other has a 1ms runtime and a
> > > 10ms period.
> > >
> > > The above two should work perfectly fine together. The 10ms period
> > > task will constantly schedule in on the 100ms task.
> > >
> > > When the 100ms task runs, it could easily be delayed by 1ms due
> > > to the 10ms task. Then lets look at the above equation
> > See above: the check for the 100ms task is only performed when the
> > task unblocks (at the beginning of its period), not when it's
> > scheduled (after the 10ms taks).
> >
> > This check is designed to take care of the following situation:
> > - consider a task with runtime 10ms and period equal to deadline
> > equal to 100ms
> > - assume the task wakes up at time t, and is assigned "remaining
> > runtime" 10ms and "scheduling deadline" t+100ms
> > - assume the task blocks after executing for 2ms. The "remaining
> > runtime" is now 8ms
> > - can the task use "remaining runtime" 8ms and "scheduling deadline"
> > t+100ms when it wakes up again?
> > Answer: if it wakes up before t+20ms, it can. Otherwise, it
> > cannot, because it would consume a fraction of CPU time larger than
> > 10%, causing missed deadlines in other tasks.
>
> Ah, you answered my question here. The check happens every time the
> task blocks as well. I still need to read the papers, and even more
> importantly, start running more tests with tracing on to review what
> exactly happens in the implementation.
If you read the original CBS paper, you'll see that it does not talk
about "wake-up" and "sleep", because it always considers a wake-up as a
job arrival, and a task blocking as a job end. This is because the
original paper only considers a simplified task model, without
"self-suspending jobs".
In SCHED_DEADLINE, things are not so simple, because it has to consider
real situations with real tasks not using the simplified original
real-time task model...

> > > > + SCHED_DEADLINE can be used to schedule real-time tasks
> > > > guaranteeing that
> > > > + the jobs' deadlines of a task are respected. In order to do
> > > > this, a task
> > > > + must be scheduled by setting:
> > > > +
> > > > + - runtime >= WCET
> > > > + - deadline = D
> > > > + - period <= P
> > > > +
> > > > + IOW, if runtime >= WCET and if period is >= P, then the
> > > > scheduling deadlines
> > > > + and the absolute deadlines (d_j) coincide, so a proper
> > > > admission control
> > > > + allows to respect the jobs' absolute deadlines for this task
> > > > (this is what is
> > > > + called "hard schedulability property" and is an extension of
> > > > Lemma 1 of [2]).
> > >
> > > I wonder if we should state the obvious (which is never obvious).
> > > That is the user must also have the following.
> > >
> > > runtime < deadline <= period
> > >
> > > Although it is fine for deadline = period, runtime should be less
> > > than deadline, otherwise the task will take over the system.
> > I think if "runtime < deadline <= period" is not respected, then the
> > admission control will fail... But yes, repeating it here can be
> > useful. If needed I'll add it to the document.
>
> Yeah, it's one of those things that you should know, but I can see
> users screwing it up ;-)
Ok...
What about presenting this as an example of the admission test failing?
Something like: 'notice that if "runtime" > "deadine" the admission
test will surely fail.'
(BTW, I am not sure if the "deadline <= period" condition is really
needed... Maybe not respecting it does not make too much sense, but I
think it is not strictly needed for schedulability purposes).



Thanks,
Luca

2014-01-28 10:03:29

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH] sched/deadline: Add sched_dl documentation

On 01/27/2014 11:29 PM, Luca Abeni wrote:
> Hi Steven,
>
> On Mon, 27 Jan 2014 12:09:38 -0500
> Steven Rostedt <[email protected]> wrote:
> [...]
>>>> Lets take a case where deadline == period. It seems that the above
>>>> would be true any time there was any delay to starting the task
>>>> or the task was interrupted by another SCHED_DEADLINE task.
>>> Not sure about this... Notice that above only applies when a task
>>> wakes up (moving from a "sleeping" state to a "ready to run"
>>> state). Not when an already ready task is scheduled.
>>> Or did I misunderstand your comment?
>>
>> No no, you understood, I missed that this only happens on wakeup. But
>> then I have to ask, what happens if the task blocks on a mutex? Would
>> that cause this check to happen then?
> Well, in theory, this check has to be performed every time the task
> wakes up from any kind of sleeping (mutex, or anything else).
> I think this is what happens in practice with the current
> implementation, but maybe I am missing something in the implementation
> details.
>

Yes. The call graph looks like:

enqueue_task_dl()
-> enqueue_dl_entity()
-> update_dl_entity() <-- this does the check through dl_entity_overflow()

and enqueue_task_dl() gets called every time the task wakes up and is enqueued
back in the dl_rq (e.g., it was waiting on a mutex).

Thanks,

- Juri

>> It may be nice to add some comments about exactly when this check is
>> performed.
> Well, maybe "wake-up" is not the right term according to the
> terminology used in the Linux kernel... The check should be performed
> every time the task changes its state from sleeping (TASK_INTERRUPTIBLE
> or TASK_UNINTERRUPTIBLE, I think) to "ready to execute" (TASK_RUNNING,
> I think) and enters the ready queue (well, RB tree :). If someone can
> suggest a more appropriate wording, I'll fix the document accordingly.
>
>
>>>> For example, lets say we have two SD tasks. One that has 50ms
>>>> runtime and a 100ms period. The other has a 1ms runtime and a
>>>> 10ms period.
>>>>
>>>> The above two should work perfectly fine together. The 10ms period
>>>> task will constantly schedule in on the 100ms task.
>>>>
>>>> When the 100ms task runs, it could easily be delayed by 1ms due
>>>> to the 10ms task. Then lets look at the above equation
>>> See above: the check for the 100ms task is only performed when the
>>> task unblocks (at the beginning of its period), not when it's
>>> scheduled (after the 10ms taks).
>>>
>>> This check is designed to take care of the following situation:
>>> - consider a task with runtime 10ms and period equal to deadline
>>> equal to 100ms
>>> - assume the task wakes up at time t, and is assigned "remaining
>>> runtime" 10ms and "scheduling deadline" t+100ms
>>> - assume the task blocks after executing for 2ms. The "remaining
>>> runtime" is now 8ms
>>> - can the task use "remaining runtime" 8ms and "scheduling deadline"
>>> t+100ms when it wakes up again?
>>> Answer: if it wakes up before t+20ms, it can. Otherwise, it
>>> cannot, because it would consume a fraction of CPU time larger than
>>> 10%, causing missed deadlines in other tasks.
>>
>> Ah, you answered my question here. The check happens every time the
>> task blocks as well. I still need to read the papers, and even more
>> importantly, start running more tests with tracing on to review what
>> exactly happens in the implementation.
> If you read the original CBS paper, you'll see that it does not talk
> about "wake-up" and "sleep", because it always considers a wake-up as a
> job arrival, and a task blocking as a job end. This is because the
> original paper only considers a simplified task model, without
> "self-suspending jobs".
> In SCHED_DEADLINE, things are not so simple, because it has to consider
> real situations with real tasks not using the simplified original
> real-time task model...
>
>>>>> + SCHED_DEADLINE can be used to schedule real-time tasks
>>>>> guaranteeing that
>>>>> + the jobs' deadlines of a task are respected. In order to do
>>>>> this, a task
>>>>> + must be scheduled by setting:
>>>>> +
>>>>> + - runtime >= WCET
>>>>> + - deadline = D
>>>>> + - period <= P
>>>>> +
>>>>> + IOW, if runtime >= WCET and if period is >= P, then the
>>>>> scheduling deadlines
>>>>> + and the absolute deadlines (d_j) coincide, so a proper
>>>>> admission control
>>>>> + allows to respect the jobs' absolute deadlines for this task
>>>>> (this is what is
>>>>> + called "hard schedulability property" and is an extension of
>>>>> Lemma 1 of [2]).
>>>>
>>>> I wonder if we should state the obvious (which is never obvious).
>>>> That is the user must also have the following.
>>>>
>>>> runtime < deadline <= period
>>>>
>>>> Although it is fine for deadline = period, runtime should be less
>>>> than deadline, otherwise the task will take over the system.
>>> I think if "runtime < deadline <= period" is not respected, then the
>>> admission control will fail... But yes, repeating it here can be
>>> useful. If needed I'll add it to the document.
>>
>> Yeah, it's one of those things that you should know, but I can see
>> users screwing it up ;-)
> Ok...
> What about presenting this as an example of the admission test failing?
> Something like: 'notice that if "runtime" > "deadine" the admission
> test will surely fail.'
> (BTW, I am not sure if the "deadline <= period" condition is really
> needed... Maybe not respecting it does not make too much sense, but I
> think it is not strictly needed for schedulability purposes).
>
>
>
> Thanks,
> Luca
>

Subject: [tip:sched/numa] sched/deadline: Add sched_dl documentation

Commit-ID: 712e5e34aef449ab680b35c0d9016f59b0a4494c
Gitweb: http://git.kernel.org/tip/712e5e34aef449ab680b35c0d9016f59b0a4494c
Author: Dario Faggioli <[email protected]>
AuthorDate: Mon, 27 Jan 2014 12:20:15 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 28 Jan 2014 13:08:40 +0100

sched/deadline: Add sched_dl documentation

Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Reviewed-by: Henrik Austad <[email protected]>
Signed-off-by: Dario Faggioli <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
[ Re-wrote sections 2 and 3. ]
Signed-off-by: Luca Abeni <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/scheduler/00-INDEX | 2 +
Documentation/scheduler/sched-deadline.txt | 281 +++++++++++++++++++++++++++++
kernel/sched/deadline.c | 3 +-
3 files changed, 285 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/00-INDEX b/Documentation/scheduler/00-INDEX
index d2651c4..46702e4 100644
--- a/Documentation/scheduler/00-INDEX
+++ b/Documentation/scheduler/00-INDEX
@@ -10,5 +10,7 @@ sched-nice-design.txt
- How and why the scheduler's nice levels are implemented.
sched-rt-group.txt
- real-time group scheduling.
+sched-deadline.txt
+ - deadline scheduling.
sched-stats.txt
- information on schedstats (Linux Scheduler Statistics).
diff --git a/Documentation/scheduler/sched-deadline.txt b/Documentation/scheduler/sched-deadline.txt
new file mode 100644
index 0000000..18adc92
--- /dev/null
+++ b/Documentation/scheduler/sched-deadline.txt
@@ -0,0 +1,281 @@
+ Deadline Task Scheduling
+ ------------------------
+
+CONTENTS
+========
+
+ 0. WARNING
+ 1. Overview
+ 2. Scheduling algorithm
+ 3. Scheduling Real-Time Tasks
+ 4. Bandwidth management
+ 4.1 System-wide settings
+ 4.2 Task interface
+ 4.3 Default behavior
+ 5. Tasks CPU affinity
+ 5.1 SCHED_DEADLINE and cpusets HOWTO
+ 6. Future plans
+
+
+0. WARNING
+==========
+
+ Fiddling with these settings can result in an unpredictable or even unstable
+ system behavior. As for -rt (group) scheduling, it is assumed that root users
+ know what they're doing.
+
+
+1. Overview
+===========
+
+ The SCHED_DEADLINE policy contained inside the sched_dl scheduling class is
+ basically an implementation of the Earliest Deadline First (EDF) scheduling
+ algorithm, augmented with a mechanism (called Constant Bandwidth Server, CBS)
+ that makes it possible to isolate the behavior of tasks between each other.
+
+
+2. Scheduling algorithm
+==================
+
+ SCHED_DEADLINE uses three parameters, named "runtime", "period", and
+ "deadline" to schedule tasks. A SCHED_DEADLINE task is guaranteed to receive
+ "runtime" microseconds of execution time every "period" microseconds, and
+ these "runtime" microseconds are available within "deadline" microseconds
+ from the beginning of the period. In order to implement this behaviour,
+ every time the task wakes up, the scheduler computes a "scheduling deadline"
+ consistent with the guarantee (using the CBS[2,3] algorithm). Tasks are then
+ scheduled using EDF[1] on these scheduling deadlines (the task with the
+ smallest scheduling deadline is selected for execution). Notice that this
+ guaranteed is respected if a proper "admission control" strategy (see Section
+ "4. Bandwidth management") is used.
+
+ Summing up, the CBS[2,3] algorithms assigns scheduling deadlines to tasks so
+ that each task runs for at most its runtime every period, avoiding any
+ interference between different tasks (bandwidth isolation), while the EDF[1]
+ algorithm selects the task with the smallest scheduling deadline as the one
+ to be executed first. Thanks to this feature, also tasks that do not
+ strictly comply with the "traditional" real-time task model (see Section 3)
+ can effectively use the new policy.
+
+ In more details, the CBS algorithm assigns scheduling deadlines to
+ tasks in the following way:
+
+ - Each SCHED_DEADLINE task is characterised by the "runtime",
+ "deadline", and "period" parameters;
+
+ - The state of the task is described by a "scheduling deadline", and
+ a "current runtime". These two parameters are initially set to 0;
+
+ - When a SCHED_DEADLINE task wakes up (becomes ready for execution),
+ the scheduler checks if
+
+ current runtime runtime
+ ---------------------------------- > ----------------
+ scheduling deadline - current time period
+
+ then, if the scheduling deadline is smaller than the current time, or
+ this condition is verified, the scheduling deadline and the
+ current budget are re-initialised as
+
+ scheduling deadline = current time + deadline
+ current runtime = runtime
+
+ otherwise, the scheduling deadline and the current runtime are
+ left unchanged;
+
+ - When a SCHED_DEADLINE task executes for an amount of time t, its
+ current runtime is decreased as
+
+ current runtime = current runtime - t
+
+ (technically, the runtime is decreased at every tick, or when the
+ task is descheduled / preempted);
+
+ - When the current runtime becomes less or equal than 0, the task is
+ said to be "throttled" (also known as "depleted" in real-time literature)
+ and cannot be scheduled until its scheduling deadline. The "replenishment
+ time" for this task (see next item) is set to be equal to the current
+ value of the scheduling deadline;
+
+ - When the current time is equal to the replenishment time of a
+ throttled task, the scheduling deadline and the current runtime are
+ updated as
+
+ scheduling deadline = scheduling deadline + period
+ current runtime = current runtime + runtime
+
+
+3. Scheduling Real-Time Tasks
+=============================
+
+ * BIG FAT WARNING ******************************************************
+ *
+ * This section contains a (not-thorough) summary on classical deadline
+ * scheduling theory, and how it applies to SCHED_DEADLINE.
+ * The reader can "safely" skip to Section 4 if only interested in seeing
+ * how the scheduling policy can be used. Anyway, we strongly recommend
+ * to come back here and continue reading (once the urge for testing is
+ * satisfied :P) to be sure of fully understanding all technical details.
+ ************************************************************************
+
+ There are no limitations on what kind of task can exploit this new
+ scheduling discipline, even if it must be said that it is particularly
+ suited for periodic or sporadic real-time tasks that need guarantees on their
+ timing behavior, e.g., multimedia, streaming, control applications, etc.
+
+ A typical real-time task is composed of a repetition of computation phases
+ (task instances, or jobs) which are activated on a periodic or sporadic
+ fashion.
+ Each job J_j (where J_j is the j^th job of the task) is characterised by an
+ arrival time r_j (the time when the job starts), an amount of computation
+ time c_j needed to finish the job, and a job absolute deadline d_j, which
+ is the time within which the job should be finished. The maximum execution
+ time max_j{c_j} is called "Worst Case Execution Time" (WCET) for the task.
+ A real-time task can be periodic with period P if r_{j+1} = r_j + P, or
+ sporadic with minimum inter-arrival time P is r_{j+1} >= r_j + P. Finally,
+ d_j = r_j + D, where D is the task's relative deadline.
+
+ SCHED_DEADLINE can be used to schedule real-time tasks guaranteeing that
+ the jobs' deadlines of a task are respected. In order to do this, a task
+ must be scheduled by setting:
+
+ - runtime >= WCET
+ - deadline = D
+ - period <= P
+
+ IOW, if runtime >= WCET and if period is >= P, then the scheduling deadlines
+ and the absolute deadlines (d_j) coincide, so a proper admission control
+ allows to respect the jobs' absolute deadlines for this task (this is what is
+ called "hard schedulability property" and is an extension of Lemma 1 of [2]).
+
+ References:
+ 1 - C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
+ ming in a hard-real-time environment. Journal of the Association for
+ Computing Machinery, 20(1), 1973.
+ 2 - L. Abeni , G. Buttazzo. Integrating Multimedia Applications in Hard
+ Real-Time Systems. Proceedings of the 19th IEEE Real-time Systems
+ Symposium, 1998. http://retis.sssup.it/~giorgio/paps/1998/rtss98-cbs.pdf
+ 3 - L. Abeni. Server Mechanisms for Multimedia Applications. ReTiS Lab
+ Technical Report. http://xoomer.virgilio.it/lucabe72/pubs/tr-98-01.ps
+
+4. Bandwidth management
+=======================
+
+ In order for the -deadline scheduling to be effective and useful, it is
+ important to have some method to keep the allocation of the available CPU
+ bandwidth to the tasks under control.
+ This is usually called "admission control" and if it is not performed at all,
+ no guarantee can be given on the actual scheduling of the -deadline tasks.
+
+ Since when RT-throttling has been introduced each task group has a bandwidth
+ associated, calculated as a certain amount of runtime over a period.
+ Moreover, to make it possible to manipulate such bandwidth, readable/writable
+ controls have been added to both procfs (for system wide settings) and cgroupfs
+ (for per-group settings).
+ Therefore, the same interface is being used for controlling the bandwidth
+ distrubution to -deadline tasks.
+
+ However, more discussion is needed in order to figure out how we want to manage
+ SCHED_DEADLINE bandwidth at the task group level. Therefore, SCHED_DEADLINE
+ uses (for now) a less sophisticated, but actually very sensible, mechanism to
+ ensure that a certain utilization cap is not overcome per each root_domain.
+
+ Another main difference between deadline bandwidth management and RT-throttling
+ is that -deadline tasks have bandwidth on their own (while -rt ones don't!),
+ and thus we don't need an higher level throttling mechanism to enforce the
+ desired bandwidth.
+
+4.1 System wide settings
+------------------------
+
+ The system wide settings are configured under the /proc virtual file system.
+
+ For now the -rt knobs are used for dl admission control and the -deadline
+ runtime is accounted against the -rt runtime. We realise that this isn't
+ entirely desirable; however, it is better to have a small interface for now,
+ and be able to change it easily later. The ideal situation (see 5.) is to run
+ -rt tasks from a -deadline server; in which case the -rt bandwidth is a direct
+ subset of dl_bw.
+
+ This means that, for a root_domain comprising M CPUs, -deadline tasks
+ can be created while the sum of their bandwidths stays below:
+
+ M * (sched_rt_runtime_us / sched_rt_period_us)
+
+ It is also possible to disable this bandwidth management logic, and
+ be thus free of oversubscribing the system up to any arbitrary level.
+ This is done by writing -1 in /proc/sys/kernel/sched_rt_runtime_us.
+
+
+4.2 Task interface
+------------------
+
+ Specifying a periodic/sporadic task that executes for a given amount of
+ runtime at each instance, and that is scheduled according to the urgency of
+ its own timing constraints needs, in general, a way of declaring:
+ - a (maximum/typical) instance execution time,
+ - a minimum interval between consecutive instances,
+ - a time constraint by which each instance must be completed.
+
+ Therefore:
+ * a new struct sched_attr, containing all the necessary fields is
+ provided;
+ * the new scheduling related syscalls that manipulate it, i.e.,
+ sched_setattr() and sched_getattr() are implemented.
+
+
+4.3 Default behavior
+---------------------
+
+ The default value for SCHED_DEADLINE bandwidth is to have rt_runtime equal to
+ 950000. With rt_period equal to 1000000, by default, it means that -deadline
+ tasks can use at most 95%, multiplied by the number of CPUs that compose the
+ root_domain, for each root_domain.
+
+ A -deadline task cannot fork.
+
+5. Tasks CPU affinity
+=====================
+
+ -deadline tasks cannot have an affinity mask smaller that the entire
+ root_domain they are created on. However, affinities can be specified
+ through the cpuset facility (Documentation/cgroups/cpusets.txt).
+
+5.1 SCHED_DEADLINE and cpusets HOWTO
+------------------------------------
+
+ An example of a simple configuration (pin a -deadline task to CPU0)
+ follows (rt-app is used to create a -deadline task).
+
+ mkdir /dev/cpuset
+ mount -t cgroup -o cpuset cpuset /dev/cpuset
+ cd /dev/cpuset
+ mkdir cpu0
+ echo 0 > cpu0/cpuset.cpus
+ echo 0 > cpu0/cpuset.mems
+ echo 1 > cpuset.cpu_exclusive
+ echo 0 > cpuset.sched_load_balance
+ echo 1 > cpu0/cpuset.cpu_exclusive
+ echo 1 > cpu0/cpuset.mem_exclusive
+ echo $$ > cpu0/tasks
+ rt-app -t 100000:10000:d:0 -D5 (it is now actually superfluous to specify
+ task affinity)
+
+6. Future plans
+===============
+
+ Still missing:
+
+ - refinements to deadline inheritance, especially regarding the possibility
+ of retaining bandwidth isolation among non-interacting tasks. This is
+ being studied from both theoretical and practical points of view, and
+ hopefully we should be able to produce some demonstrative code soon;
+ - (c)group based bandwidth management, and maybe scheduling;
+ - access control for non-root users (and related security concerns to
+ address), which is the best way to allow unprivileged use of the mechanisms
+ and how to prevent non-root users "cheat" the system?
+
+ As already discussed, we are planning also to merge this work with the EDF
+ throttling patches [https://lkml.org/lkml/2010/2/23/239] but we still are in
+ the preliminary phases of the merge and we really seek feedback that would
+ help us decide on the direction it should take.
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0de2482..0dd5e09 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -351,7 +351,8 @@ static void replenish_dl_entity(struct sched_dl_entity *dl_se,
* disrupting the schedulability of the system. Otherwise, we should
* refill the runtime and set the deadline a period in the future,
* because keeping the current (absolute) deadline of the task would
- * result in breaking guarantees promised to other tasks.
+ * result in breaking guarantees promised to other tasks (refer to
+ * Documentation/scheduler/sched-deadline.txt for more informations).
*
* This function returns true if:
*