2009-09-22 11:30:32

by Dario Faggioli

[permalink] [raw]
Subject: [RFC][PATCH] SCHED_EDF scheduling class

Hi Peter, Hi all,

this is the implementation of a new scheduling class called
SCHED_EDF made by me and Michael Trimarchi (CC:-ed). We followed the
suggestions you gave us at the multicore workshop in Kaiserlsautern
on October.

For other people: it is a real-time scheduling policy based on the
Earliest Deadline First (EDF) algorithm, one of the most common
real-time scheduling algorithms.

The need of an EDF scheduler in Linux has been already highlighted in
the Documentation/scheduler/sched-rt-group.txt file, which says

"The next project will be SCHED_EDF (Earliest Deadline First
scheduling) to bring full deadline scheduling to the linux kernel."

This implementation is part of a FP7 European project called ACTORS,
and financially supported by the European commission
(see http://www.actors-project.eu).

The involved partners (which include Ericsson Research, Evidence Srl,
AKAtech) strongly believe that a general-purpose operating system like
Linux should provide a standard real-time scheduling policy (like the
one implemented in this patch) still allowing to schedule non-real-time
tasks in the usual way.

This new scheduling class will also be the topic of a paper and a talk
at the next Real-Time Linux Workshop in Dresden.

The full patch is available as attachment of this e-mail. For easy of
review, a split patch-set will come soon.

Why do we need a further scheduling class ? The existing scheduling
classes (i.e., sched_fair and sched_rt), perform very well in their own
domain of application. However,

1. they cannot provide the guarantees a time-sensitive application may
require.

Using sched_fair, no concept of timing constraint (e.g., deadline)
can be associated to tasks. In fact, although it is possible to
assign a share of the processor time to a task (or a group of tasks),
there is no way to specify that a task must execute for 20msec within
100msec, unlike using a real-time scheduling algorithm, such as EDF.
As a proof of concept, we implemented a very simple test to run two
tasks that need to execute for 20msec every 50msec. We scheduled them
using CFS with the same CPU share, and we observed that even if there
is enough spare CPU time (the system is idle), the tasks occasionally
experience some deadline miss. The test is available on the project
website (see below).

Using sched_rt, instead, we can give that kind of guarantee only
using a global period, with the constraint that "a subgroup must have
a smaller or equal period to its parent".

2. the latency experienced by a task (i.e., the time between two
consecutive executions of a task) is not deterministic and cannot be
bound, since it highly depends on the number of tasks running in the
system at that time.
Using EDF, instead, this time is deterministic, bounded and known at
any instant of time.

These issues are particularly critical when running time-sensitive or
control applications. Without a real-time scheduler, in fact, it is not
possible to make any feasibility study of the system under development,
and developers cannot be sure that the timing requirements will be met
under any circumstance.
And we feel that this prevents the usage of Linux in industrial contexts
like the one addressed by our project.

A key feature of our scheduling class is that "temporal isolation" is
ensured.
This means that the temporal behavior of each task (i.e., its ability to
meet its deadlines) is not affected by the behavior of any other task in
the system.
In other words, even if a task misbehaves, it is not able to exploit
larger execution times than the amount it has been allocated.

Each task is characterized by a "budget" sched_runtime and a "period"
sched_period, equal to its deadline.
At any instant of time, the system schedules the (ready) task having
earliest deadline. During execution, task's budget is decreased by an
amount equal to the time executed. When task's budget reaches zero
(i.e., the task executed for a time equal to its budget), it is stopped
until the time instant of its next deadline. At that time, the task is
made runnable again, its budget is refilled and a new deadline is
computed.
This means that each task is guaranteed a share of processor time equal
to sched_runtime/sched_period _every_ sched_period.
Of course, the sum of sched_runtime/sched_period of all tasks cannot be
higher than the total throughput available on the system (i.e., 100% on
uniprocessor systems), otherwise the system is overloaded and task
deadlines cannot be guaranteed.

The project is hosted at the following URL:

http://www.evidence.eu.com/sched_edf.html

A public git repository is available at the following address. Please,
to save our bandwidth do a pull on an existing git tree as follows:

git clone \
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

cd linux-2.6

git pull git://feanor.sssup.it/sched-edf.git sched-edf

We are also maintaining a branch aligned with sched-devel tree. It is
called sched-devel-edf and can be pulled using:

git pull git://feanor.sssup.it/sched-edf.git sched-devel-edf

We are going to be the maintainers of this code, keeping it aligned with
Linus' and sched-devel trees.

To easily test SCHED_EDF you can use our modified version of schedtool
as explained below. For instance, to run a simple yes program with
100000us period and 25000us budget, just type the following commands:

git clone git://feanor.sssup.it/schedtool-edf.git

cd schedtool-edf
make
./schedtool -E -d 100000 -b 25000 -a 0 -e yes > /dev/null &

Some notes about the implementation, on which we are open to comments
and suggestions:

* It implements the well-known Earliest Deadline First (EDF) algorithm

* It is aligned with the current mainstream kernel

* It relies on standard Linux mechanisms (e.g., control groups) to
natively support multicore platforms and to provide hierarchical
scheduling through a standard API (see below)

* It does not make any restrictive assumption about the characteristics
of the tasks: it can handle periodic, sporadic or aperiodic tasks
(see below)

* We decided to put the new scheduling class in between sched_rt and
sched_fair, for the following reasons:

- Backward compatibility and (POSIX) standard compliance require
SCHED_FIFO/SCHED_RR tasks to run as soon as they can

- Since EDF is more efficient than fixed-priority in terms of
schedulability, it will be easier for it to achieve high
utilization even if it runs in the idle time of fixed-priority,
than vice-versa

- Real-time analysis techniques already exist to deal with exactly
such scheduling architecture

- The amount of interference coming from system SCHED_FIFO/SCHED_RR
tasks can be accounted as system overhead

- In recent kernels, the bandwidth used by the whole sched_rt
scheduling class can be forced to stay below a certain threshold

* It works natively on multicore platform. We have one runqueue for
each CPU, implemented with a red-black tree, for efficient
manipulation.

* We will migrate the tasks among the runqueues trying to always have,
on a system with m CPUs, the m earliest deadline ready tasks running
on the CPUs.
This is not present in the attached patch, but will be available
soon.

* Tasks are not migrated if a specific CPU affinity has been set.
Affinity can be set using existing Linux mechanisms (i.e.,
sched_setaffinity() for tasks and cpuset.cpus for task groups). Thus,
it is sufficient that each EDF task in the system has its affinity
set, to achieve a partitioned scheduling with very few overhead.

* The API for tasks uses an additional system call, called
sched_setscheduler_ex(...), to assign and modify the scheduling
parameters described above (i.e., sched_runtime and sched_period).
The existing system call sched_setscheduler() has not been extended,
because of the binary compatibility issues that modifying its struct
sched_param parameter would have raised for existing applications. So
we implemented an additional syscall. The system call has the
following prototype:

#define SCHED_EDF 6

struct sched_param_ex {
int sched_priority; /* Backward compliance */
struct timespec sched_edf_period;
struct timespec sched_edf_runtime;
};

int sched_setscheduler_ex (pid_t pid, int policy, struct sched_param_ex *param);

for the sake of consistency, also sched_setaparam_ex() and
sched_getparam_ex() have been implemented.

* We did not add any further system call to allow periodic tasks to
specify the end of the current instance, and the semantics of the
existing sched_yield() system call has been extended for this
purpose.

* It is integrated with the cgroups filesystem, so it's possible to
assign a reservation to a group of tasks and make hierarchical
scheduling.
The cgroups interface provides two entries, cpu.edf_runtime_us and
cpu.edf_period_us. These files are created once the cgroup filesystem
is mounted, to get and set the group bandwidth.

* In case a SCHED_EDF tasks forks, parent's budget is split among
parent and child.

Tests about performance of our scheduling class can be found in the
paper of the next real-time Linux workshop (soon freely available on
Internet, I guess) and will be shown during our talk at the workshop.

During the last meeting of the project at Brussels, the reviewers from
the European commission encouraged us to submit the code here, to gather
thoughts and comments from the Linux kernel community.

Therefore, we wait for your feedback and contribution.

Many thanks,

Dario Faggioli
Michael Trimarchi
Claudio Scordino

Signed-off-by: Dario Faggioli <[email protected]>
Signed-off-by: Michael Trimarchi <[email protected]>
Signed-off-by: Claudio Scordino <[email protected]>
---
arch/arm/include/asm/unistd.h | 3 +
arch/arm/kernel/calls.S | 3 +
arch/x86/ia32/ia32entry.S | 3 +
arch/x86/include/asm/unistd_32.h | 3 +
arch/x86/include/asm/unistd_64.h | 6 +
arch/x86/kernel/syscall_table_32.S | 3 +
include/linux/sched.h | 56 ++++
include/linux/syscalls.h | 7 +
init/Kconfig | 14 +
kernel/fork.c | 12 +
kernel/sched.c | 545 ++++++++++++++++++++++++++++++++++--
kernel/sched_debug.c | 36 +++-
kernel/sched_edf.c | 550 ++++++++++++++++++++++++++++++++++++
kernel/sched_fair.c | 4 +-
kernel/sched_rt.c | 2 +-
kernel/trace/trace_sched_wakeup.c | 31 ++
16 files changed, 1254 insertions(+), 24 deletions(-)

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 89f7ead..e851625 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -391,6 +391,9 @@
#define __NR_pwritev (__NR_SYSCALL_BASE+362)
#define __NR_rt_tgsigqueueinfo (__NR_SYSCALL_BASE+363)
#define __NR_perf_event_open (__NR_SYSCALL_BASE+364)
+#define __NR_sched_setscheduler_ex (__NR_SYSCALL_BASE+365)
+#define __NR_sched_setparam_ex (__NR_SYSCALL_BASE+366)
+#define __NR_sched_getparam_ex (__NR_SYSCALL_BASE+367)

/*
* The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index fafce1b..42ad362 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -374,6 +374,9 @@
CALL(sys_pwritev)
CALL(sys_rt_tgsigqueueinfo)
CALL(sys_perf_event_open)
+/* 365 */ CALL(sys_sched_setscheduler_ex)
+ CALL(sys_sched_setparam_ex)
+ CALL(sys_sched_getparam_ex)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 74619c4..fde92c2 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -832,4 +832,7 @@ ia32_sys_call_table:
.quad compat_sys_pwritev
.quad compat_sys_rt_tgsigqueueinfo /* 335 */
.quad sys_perf_event_open
+ .quad sys_sched_setscheduler_ex
+ .quad sys_sched_setparam_ex
+ .quad sys_sched_getparam_ex
ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index 6fb3c20..a83163f 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -342,6 +342,9 @@
#define __NR_pwritev 334
#define __NR_rt_tgsigqueueinfo 335
#define __NR_perf_event_open 336
+#define __NR_sched_setscheduler_ex 337
+#define __NR_sched_setparam_ex 338
+#define __NR_sched_getparam_ex 339

#ifdef __KERNEL__

diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index 8d3ad0a..a11831c 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -661,6 +661,12 @@ __SYSCALL(__NR_pwritev, sys_pwritev)
__SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
#define __NR_perf_event_open 298
__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
+#define __NR_sched_setscheduler_ex 299
+__SYSCALL(__NR_sched_setscheduler_ex, sys_sched_setscheduler_ex)
+#define __NR_sched_setparam_ex 230
+__SYSCALL(__NR_sched_setparam_ex, sys_sched_setparam_ex)
+#define __NR_sched_getparam_ex 231
+__SYSCALL(__NR_sched_getparam_ex, sys_sched_getparam_ex)

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 0157cd2..38f056c 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -336,3 +336,6 @@ ENTRY(sys_call_table)
.long sys_pwritev
.long sys_rt_tgsigqueueinfo /* 335 */
.long sys_perf_event_open
+ .long sys_sched_setscheduler_ex
+ .long sys_sched_setparam_ex
+ .long sys_sched_getparam_ex
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8fe351c..2554e08 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -38,6 +38,7 @@
#define SCHED_BATCH 3
/* SCHED_ISO: reserved but not implemented yet */
#define SCHED_IDLE 5
+#define SCHED_EDF 6
/* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
#define SCHED_RESET_ON_FORK 0x40000000

@@ -94,6 +95,19 @@ struct sched_param {

#include <asm/processor.h>

+/*
+ * For an typical EDF periodic or sporadic task we need a runtime
+ * and period (equal to deadline) specification in the sched_param
+ * structure.
+ * However, the original struct sched_param can't be modified, for binary
+ * compatibility issues, and thus this new one is created.
+ */
+struct sched_param_ex {
+ int sched_priority;
+ struct timespec sched_period;
+ struct timespec sched_runtime;
+};
+
struct exec_domain;
struct futex_pi_state;
struct robust_list_head;
@@ -147,12 +161,15 @@ extern unsigned long get_parent_ip(unsigned long addr);

struct seq_file;
struct cfs_rq;
+struct edf_rq;
struct task_group;
#ifdef CONFIG_SCHED_DEBUG
extern void proc_sched_show_task(struct task_struct *p, struct seq_file *m);
extern void proc_sched_set_task(struct task_struct *p);
extern void
print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
+extern void
+print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq);
#else
static inline void
proc_sched_show_task(struct task_struct *p, struct seq_file *m)
@@ -165,6 +182,10 @@ static inline void
print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
}
+static inline void
+print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq)
+{
+}
#endif

extern unsigned long long time_sync_thresh;
@@ -1161,6 +1182,28 @@ struct sched_entity {
#endif
};

+#define EDF_NEW 1
+#define EDF_ENDED 2
+#define EDF_RUNNING 4
+
+struct sched_edf_entity {
+ struct rb_node rb_node;
+ /* actual scheduling parameters */
+ u64 deadline;
+ s64 runtime;
+ unsigned int edf_flags;
+
+ /* original parameters taken from sched_param_ex */
+ u64 runtime_max;
+ u64 period;
+#ifdef CONFIG_EDF_GROUP_SCHED
+ u64 bw;
+#endif
+
+ int nr_cpus_allowed;
+ struct hrtimer edf_timer;
+};
+
struct sched_rt_entity {
struct list_head run_list;
unsigned long timeout;
@@ -1198,6 +1241,7 @@ struct task_struct {
unsigned int rt_priority;
const struct sched_class *sched_class;
struct sched_entity se;
+ struct sched_edf_entity edf;
struct sched_rt_entity rt;

#ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -1543,6 +1587,18 @@ static inline int rt_task(struct task_struct *p)
return rt_prio(p->prio);
}

+static inline int edf_policy(int policy)
+{
+ if (unlikely(policy == SCHED_EDF))
+ return 1;
+ return 0;
+}
+
+static inline int edf_task(struct task_struct *p)
+{
+ return edf_policy(p->policy);
+}
+
static inline struct pid *task_pid(struct task_struct *task)
{
return task->pids[PIDTYPE_PID].pid;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 8d8285a..f5b76a1 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -33,6 +33,7 @@ struct pollfd;
struct rlimit;
struct rusage;
struct sched_param;
+struct sched_param_ex;
struct semaphore;
struct sembuf;
struct shmid_ds;
@@ -390,11 +391,17 @@ asmlinkage long sys_clock_nanosleep(clockid_t which_clock, int flags,
asmlinkage long sys_nice(int increment);
asmlinkage long sys_sched_setscheduler(pid_t pid, int policy,
struct sched_param __user *param);
+asmlinkage long sys_sched_setscheduler_ex(pid_t pid, int policy,
+ struct sched_param_ex __user *param);
asmlinkage long sys_sched_setparam(pid_t pid,
struct sched_param __user *param);
+asmlinkage long sys_sched_setparam_ex(pid_t pid,
+ struct sched_param_ex __user *param);
asmlinkage long sys_sched_getscheduler(pid_t pid);
asmlinkage long sys_sched_getparam(pid_t pid,
struct sched_param __user *param);
+asmlinkage long sys_sched_getparam_ex(pid_t pid,
+ struct sched_param_ex __user *param);
asmlinkage long sys_sched_setaffinity(pid_t pid, unsigned int len,
unsigned long __user *user_mask_ptr);
asmlinkage long sys_sched_getaffinity(pid_t pid, unsigned int len,
diff --git a/init/Kconfig b/init/Kconfig
index 0aa6579..cd119ac 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -454,6 +454,20 @@ config RT_GROUP_SCHED
realtime bandwidth for them.
See Documentation/scheduler/sched-rt-group.txt for more information.

+config EDF_GROUP_SCHED
+ bool "Group scheduling for SCHED_EDF"
+ depends on EXPERIMENTAL
+ depends on GROUP_SCHED
+ depends on CGROUPS
+ depends on !USER_SCHED
+ select CPUSETS
+ default n
+ help
+ This feature lets you explicitly specify, in terms of runtime
+ and period, the bandwidth of a task control group. This means
+ tasks and other control groups can be added, until such bandwidth
+ limit is reached, after that they will fail.
+
choice
depends on GROUP_SCHED
prompt "Basis for grouping tasks"
diff --git a/kernel/fork.c b/kernel/fork.c
index 2cebfb2..b4f3874 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1203,6 +1203,18 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->parent_exec_id = current->self_exec_id;
}

+ /*
+ * In case an EDF task is forking, both parent and child get half
+ * the bandwidth of the parent (via dividing runtime by 2).
+ * This make it a lot easier to deal with control group bandwidth, and
+ * also prevent an heavily spawning task to exhaust the system (or
+ * a control group, if enabled) bandwidht.
+ */
+ if (edf_task(p->real_parent)) {
+ p->real_parent->edf.runtime_max /= 2;
+ p->edf.runtime_max = p->real_parent->edf.runtime_max;
+ }
+
spin_lock(&current->sighand->siglock);

/*
diff --git a/kernel/sched.c b/kernel/sched.c
index 91843ba..4681a5c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -131,6 +131,11 @@ static inline int task_has_rt_policy(struct task_struct *p)
return rt_policy(p->policy);
}

+static inline int task_has_edf_policy(struct task_struct *p)
+{
+ return edf_policy(p->policy);
+}
+
/*
* This is the priority-queue data structure of the RT scheduling class:
*/
@@ -227,6 +232,16 @@ static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
}
#endif

+struct edf_bandwidth {
+ spinlock_t lock;
+ /* runtime and period that determine the bandwidth of the group */
+ u64 runtime_max;
+ u64 period;
+ u64 bw;
+ /* accumulator for the actual allocated bandwidth in a group */
+ u64 total_bw;
+};
+
/*
* sched_domains_mutex serializes calls to arch_init_sched_domains,
* detach_destroy_domains and partition_sched_domains.
@@ -259,6 +274,11 @@ struct task_group {
unsigned long shares;
#endif

+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct edf_bandwidth edf_bandwidth;
+ struct edf_rq **edf_rq;
+#endif
+
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
struct rt_rq **rt_rq;
@@ -296,6 +316,10 @@ static DEFINE_PER_CPU(struct sched_entity, init_sched_entity);
static DEFINE_PER_CPU_SHARED_ALIGNED(struct cfs_rq, init_tg_cfs_rq);
#endif /* CONFIG_FAIR_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+static DEFINE_PER_CPU(struct edf_rq, init_edf_rq) ____cacheline_aligned_in_smp;
+#endif
+
#ifdef CONFIG_RT_GROUP_SCHED
static DEFINE_PER_CPU(struct sched_rt_entity, init_sched_rt_entity);
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rt_rq, init_rt_rq);
@@ -447,6 +471,18 @@ struct cfs_rq {
#endif
};

+struct edf_rq {
+ unsigned long edf_nr_running;
+
+ /* as in CFS, the runqueue is based on an rbtree, deadline ordered */
+ struct rb_root rb_root;
+ struct rb_node *rb_leftmost;
+
+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct rq *rq;
+#endif
+};
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -544,6 +580,7 @@ struct rq {
u64 nr_migrations_in;

struct cfs_rq cfs;
+ struct edf_rq edf;
struct rt_rq rt;

#ifdef CONFIG_FAIR_GROUP_SCHED
@@ -1816,7 +1853,9 @@ static void calc_load_account_active(struct rq *this_rq);
#include "sched_stats.h"
#include "sched_idletask.c"
#include "sched_fair.c"
+#include "sched_edf.c"
#include "sched_rt.c"
+
#ifdef CONFIG_SCHED_DEBUG
# include "sched_debug.c"
#endif
@@ -1837,7 +1876,7 @@ static void dec_nr_running(struct rq *rq)

static void set_load_weight(struct task_struct *p)
{
- if (task_has_rt_policy(p)) {
+ if (task_has_rt_policy(p) || task_has_edf_policy(p)) {
p->se.load.weight = prio_to_weight[0] * 2;
p->se.load.inv_weight = prio_to_wmult[0] >> 1;
return;
@@ -2523,7 +2562,8 @@ void sched_fork(struct task_struct *p, int clone_flags)
* Revert to default priority/policy on fork if requested.
*/
if (unlikely(p->sched_reset_on_fork)) {
- if (p->policy == SCHED_FIFO || p->policy == SCHED_RR)
+ if (p->policy == SCHED_FIFO || p->policy == SCHED_RR ||
+ edf_policy(p->policy))
p->policy = SCHED_NORMAL;

if (p->normal_prio < DEFAULT_PRIO)
@@ -2541,8 +2581,12 @@ void sched_fork(struct task_struct *p, int clone_flags)
p->sched_reset_on_fork = 0;
}

- if (!rt_prio(p->prio))
- p->sched_class = &fair_sched_class;
+ if (!rt_prio(p->prio)) {
+ if (edf_policy(p->policy))
+ p->sched_class = &edf_sched_class;
+ else
+ p->sched_class = &fair_sched_class;
+ }

#ifdef CONFIG_SMP
cpu = p->sched_class->select_task_rq(p, SD_BALANCE_FORK, 0);
@@ -2565,6 +2609,68 @@ void sched_fork(struct task_struct *p, int clone_flags)
put_cpu();
}

+#if defined(CONFIG_RT_GROUP_SCHED) || defined(CONFIG_EDF_GROUP_SCHED)
+static unsigned long to_ratio(u64 period, u64 runtime)
+{
+ if (runtime == RUNTIME_INF)
+ return 1ULL << 20;
+
+ return div64_u64(runtime << 20, period);
+}
+
+#ifdef CONFIG_EDF_GROUP_SCHED
+static inline
+void __edf_tg_clear_task_bw(struct task_group *tg, u64 tsk_bw)
+{
+ tg->edf_bandwidth.total_bw -= tsk_bw;
+}
+
+static inline
+void __edf_tg_add_task_bw(struct task_group *tg, u64 tsk_bw)
+{
+ tg->edf_bandwidth.total_bw += tsk_bw;
+}
+
+static inline
+int __edf_check_task_tg_bw(struct task_struct *p, struct task_group *tg,
+ int policy, struct sched_param_ex *param_ex)
+{
+ u64 tg_bw = tg->edf_bandwidth.bw;
+ u64 bw = p->edf.bw;
+ u64 new_bw = 0;
+
+ if (tg->edf_bandwidth.bw <= 0)
+ return 0;
+
+ if (edf_policy(policy))
+ new_bw = to_ratio(timespec_to_ns(&param_ex->sched_period),
+ timespec_to_ns(&param_ex->sched_runtime));
+
+ /*
+ * Either if a task, enters, leave, or stays SCHED_EDF but changing
+ * its parameters, we need to update accordingly the allocated
+ * bandwidth of the control group it is inside.
+ */
+ if (task_has_edf_policy(p) && !edf_policy(policy)) {
+ __edf_tg_clear_task_bw(tg, bw);
+ return 1;
+ } else if (task_has_edf_policy(p) && edf_policy(policy) &&
+ tg_bw >= tg->edf_bandwidth.total_bw - bw + new_bw) {
+ __edf_tg_clear_task_bw(tg, bw);
+ __edf_tg_add_task_bw(tg, new_bw);
+ return 1;
+ } else if (edf_policy(policy) && !task_has_edf_policy(p) &&
+ tg_bw >= tg->edf_bandwidth.total_bw + new_bw) {
+ __edf_tg_add_task_bw(tg, new_bw);
+ return 1;
+ }
+
+ return 0;
+}
+#endif /* CONFIG_EDF_GROUP_SCHED */
+
+#endif /* CONFIG_RT_GROUP_SCHED || CONFIG_EDF_GROUP_SCHED */
+
/*
* wake_up_new_task - wake up a newly created task for the first time.
*
@@ -2582,6 +2688,18 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
update_rq_clock(rq);

p->prio = effective_prio(p);
+ if (edf_task(p)) {
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ /*
+ * If the new task we are waking up is EDF,
+ * some initialization is needed
+ */
+ RB_CLEAR_NODE(&edf_se->rb_node);
+ edf_se->runtime = edf_se->runtime_max;
+ edf_se->edf_flags = EDF_NEW;
+ init_edf_timer(&edf_se->edf_timer);
+ }

if (!p->sched_class->task_new || !current->se.on_rq) {
activate_task(rq, p, 0);
@@ -2725,6 +2843,23 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
+#ifdef CONFIG_EDF_GROUP_SCHED
+ if (edf_task(prev)) {
+ struct task_group *tg = task_group(prev);
+ unsigned long flags;
+
+ /*
+ * When an EDF task dies, we need to drain its
+ * bandwidth from the cgroup it was accomodating hime,
+ * to make space for new tasks.
+ */
+ spin_lock_irqsave(&tg->edf_bandwidth.lock, flags);
+ __edf_tg_clear_task_bw(tg, prev->edf.bw);
+ spin_unlock_irqrestore(&tg->edf_bandwidth.lock, flags);
+
+ hrtimer_cancel(&prev->edf.edf_timer);
+ }
+#endif
/*
* Remove function-return probe instances associated with this
* task and put them back on the free list.
@@ -5952,8 +6087,12 @@ void rt_mutex_setprio(struct task_struct *p, int prio)

if (rt_prio(prio))
p->sched_class = &rt_sched_class;
- else
- p->sched_class = &fair_sched_class;
+ else {
+ if (!edf_task(p))
+ p->sched_class = &fair_sched_class;
+ else
+ p->sched_class = &edf_sched_class;
+ }

p->prio = prio;

@@ -5987,9 +6126,9 @@ void set_user_nice(struct task_struct *p, long nice)
* The RT priorities are set via sched_setscheduler(), but we still
* allow the 'normal' nice value to be set - but as expected
* it wont have any effect on scheduling until the task is
- * SCHED_FIFO/SCHED_RR:
+ * SCHED_EDF, SCHED_FIFO or SCHED_RR:
*/
- if (task_has_rt_policy(p)) {
+ if (unlikely(task_has_rt_policy(p) || task_has_edf_policy(p))) {
p->static_prio = NICE_TO_PRIO(nice);
goto out_unlock;
}
@@ -6136,6 +6275,9 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
case SCHED_IDLE:
p->sched_class = &fair_sched_class;
break;
+ case SCHED_EDF:
+ p->sched_class = &edf_sched_class;
+ break;
case SCHED_FIFO:
case SCHED_RR:
p->sched_class = &rt_sched_class;
@@ -6149,6 +6291,25 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
set_load_weight(p);
}

+static void
+__setscheduler_ex(struct rq *rq, struct task_struct *p, int policy,
+ struct sched_param_ex *param_ex)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ RB_CLEAR_NODE(&edf_se->rb_node);
+ edf_se->edf_flags = EDF_NEW;
+
+ edf_se->runtime_max = timespec_to_ns(&param_ex->sched_runtime);
+ edf_se->period = timespec_to_ns(&param_ex->sched_period);
+ edf_se->bw = to_ratio(timespec_to_ns(&param_ex->sched_period),
+ timespec_to_ns(&param_ex->sched_runtime));
+
+ edf_se->runtime = edf_se->runtime_max;
+
+ init_edf_timer(&edf_se->edf_timer);
+}
+
/*
* check the target process has a UID that matches the current process's
*/
@@ -6166,7 +6327,9 @@ static bool check_same_owner(struct task_struct *p)
}

static int __sched_setscheduler(struct task_struct *p, int policy,
- struct sched_param *param, bool user)
+ struct sched_param *param,
+ struct sched_param_ex *param_ex,
+ bool user)
{
int retval, oldprio, oldpolicy = -1, on_rq, running;
unsigned long flags;
@@ -6186,6 +6349,7 @@ recheck:
policy &= ~SCHED_RESET_ON_FORK;

if (policy != SCHED_FIFO && policy != SCHED_RR &&
+ policy != SCHED_EDF &&
policy != SCHED_NORMAL && policy != SCHED_BATCH &&
policy != SCHED_IDLE)
return -EINVAL;
@@ -6200,6 +6364,13 @@ recheck:
(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
return -EINVAL;
+ if (edf_policy(policy) && param_ex->sched_priority != 0)
+ return -EINVAL;
+ if (edf_policy(policy) && (param_ex == NULL ||
+ timespec_to_ns(&param_ex->sched_period) == 0 ||
+ timespec_to_ns(&param_ex->sched_period) <
+ timespec_to_ns(&param_ex->sched_runtime)))
+ return -EINVAL;
if (rt_policy(policy) != (param->sched_priority != 0))
return -EINVAL;

@@ -6273,6 +6444,24 @@ recheck:
spin_unlock_irqrestore(&p->pi_lock, flags);
goto recheck;
}
+#ifdef CONFIG_EDF_GROUP_SCHED
+ /*
+ * If EDF tasks are involved, check tha bandwidth of its group,
+ * and do not allow setting the task EDF if there is not enough.
+ */
+ if (edf_policy(policy) || edf_task(p)) {
+ struct task_group *tg = task_group(p);
+
+ spin_lock(&tg->edf_bandwidth.lock);
+ if (!__edf_check_task_tg_bw(p, tg, policy, param_ex)) {
+ spin_unlock(&tg->edf_bandwidth.lock);
+ __task_rq_unlock(rq);
+ spin_unlock_irqrestore(&p->pi_lock, flags);
+ return -EPERM;
+ }
+ spin_unlock(&tg->edf_bandwidth.lock);
+ }
+#endif
update_rq_clock(rq);
on_rq = p->se.on_rq;
running = task_current(rq, p);
@@ -6285,6 +6474,8 @@ recheck:

oldprio = p->prio;
__setscheduler(rq, p, policy, param->sched_priority);
+ if (edf_policy(policy))
+ __setscheduler_ex(rq, p, policy, param_ex);

if (running)
p->sched_class->set_curr_task(rq);
@@ -6312,10 +6503,17 @@ recheck:
int sched_setscheduler(struct task_struct *p, int policy,
struct sched_param *param)
{
- return __sched_setscheduler(p, policy, param, true);
+ return __sched_setscheduler(p, policy, param, NULL, true);
}
EXPORT_SYMBOL_GPL(sched_setscheduler);

+int sched_setscheduler_ex(struct task_struct *p, int policy,
+ struct sched_param *param,
+ struct sched_param_ex *param_ex)
+{
+ return __sched_setscheduler(p, policy, param, param_ex, true);
+}
+
/**
* sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
* @p: the task in question.
@@ -6330,7 +6528,7 @@ EXPORT_SYMBOL_GPL(sched_setscheduler);
int sched_setscheduler_nocheck(struct task_struct *p, int policy,
struct sched_param *param)
{
- return __sched_setscheduler(p, policy, param, false);
+ return __sched_setscheduler(p, policy, param, NULL, false);
}

static int
@@ -6355,6 +6553,33 @@ do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
return retval;
}

+static int
+do_sched_setscheduler_ex(pid_t pid, int policy,
+ struct sched_param_ex __user *param_ex)
+{
+ struct sched_param lparam;
+ struct sched_param_ex lparam_ex;
+ struct task_struct *p;
+ int retval;
+
+ if (!param_ex || pid < 0)
+ return -EINVAL;
+ if (copy_from_user(&lparam_ex, param_ex,
+ sizeof(struct sched_param_ex)))
+ return -EFAULT;
+
+ rcu_read_lock();
+ retval = -ESRCH;
+ p = find_process_by_pid(pid);
+ if (p != NULL) {
+ lparam.sched_priority = lparam_ex.sched_priority;
+ retval = sched_setscheduler_ex(p, policy, &lparam, &lparam_ex);
+ }
+ rcu_read_unlock();
+
+ return retval;
+}
+
/**
* sys_sched_setscheduler - set/change the scheduler policy and RT priority
* @pid: the pid in question.
@@ -6372,6 +6597,22 @@ SYSCALL_DEFINE3(sched_setscheduler, pid_t, pid, int, policy,
}

/**
+ * sys_sched_setscheduler_ex - set/change the scheduler policy and EDF deadline
+ * @pid: the pid in question.
+ * @policy: new policy (should be SCHED_EDF).
+ * @param: structure containg the extended EDF parameters.
+ */
+SYSCALL_DEFINE3(sched_setscheduler_ex, pid_t, pid, int, policy,
+ struct sched_param_ex __user *, param_ex)
+{
+ if (policy < 0)
+ return -EINVAL;
+
+ return do_sched_setscheduler_ex(pid, policy, param_ex);
+}
+
+
+/**
* sys_sched_setparam - set/change the RT priority of a thread
* @pid: the pid in question.
* @param: structure containing the new RT priority.
@@ -6381,6 +6622,12 @@ SYSCALL_DEFINE2(sched_setparam, pid_t, pid, struct sched_param __user *, param)
return do_sched_setscheduler(pid, -1, param);
}

+SYSCALL_DEFINE2(sched_setparam_ex, pid_t, pid,
+ struct sched_param_ex __user *, param_ex)
+{
+ return do_sched_setscheduler_ex(pid, -1, param_ex);
+}
+
/**
* sys_sched_getscheduler - get the policy (scheduling class) of a thread
* @pid: the pid in question.
@@ -6445,6 +6692,11 @@ out_unlock:
return retval;
}

+SYSCALL_DEFINE2(sched_getparam_ex, pid_t, pid, struct sched_param_ex __user *, param)
+{
+ return -ENOSYS;
+}
+
long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
{
cpumask_var_t cpus_allowed, new_mask;
@@ -9210,6 +9462,14 @@ static void init_cfs_rq(struct cfs_rq *cfs_rq, struct rq *rq)
cfs_rq->min_vruntime = (u64)(-(1LL << 20));
}

+static void init_edf_rq(struct edf_rq *edf_rq, struct rq *rq)
+{
+ edf_rq->rb_root = RB_ROOT;
+#ifdef CONFIG_EDF_GROUP_SCHED
+ edf_rq->rq = rq;
+#endif
+}
+
static void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
{
struct rt_prio_array *array;
@@ -9275,6 +9535,22 @@ static void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
}
#endif

+#ifdef CONFIG_EDF_GROUP_SCHED
+void init_tg_edf_entry(struct task_group *tg, struct edf_rq *edf_rq,
+ struct sched_edf_entity *edf_se, int cpu, int add,
+ struct sched_edf_entity *parent)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+ tg->edf_rq[cpu] = &rq->edf;
+
+ spin_lock_init(&tg->edf_bandwidth.lock);
+ tg->edf_bandwidth.runtime_max = 0;
+ tg->edf_bandwidth.period = 0;
+ tg->edf_bandwidth.total_bw = 0;
+}
+#endif
+
#ifdef CONFIG_RT_GROUP_SCHED
static void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq,
struct sched_rt_entity *rt_se, int cpu, int add,
@@ -9316,6 +9592,10 @@ void __init sched_init(void)
#ifdef CONFIG_RT_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
+#ifdef CONFIG_EDF_GROUP_SCHED
+ alloc_size += 2 * nr_cpu_ids * sizeof(void **);
+#endif
+
#ifdef CONFIG_USER_SCHED
alloc_size *= 2;
#endif
@@ -9344,6 +9624,10 @@ void __init sched_init(void)
ptr += nr_cpu_ids * sizeof(void **);
#endif /* CONFIG_USER_SCHED */
#endif /* CONFIG_FAIR_GROUP_SCHED */
+#ifdef CONFIG_EDF_GROUP_SCHED
+ init_task_group.edf_rq = (struct edf_rq **)ptr;
+ ptr += nr_cpu_ids * sizeof(void **);
+#endif /* CONFIG_EDF_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
init_task_group.rt_se = (struct sched_rt_entity **)ptr;
ptr += nr_cpu_ids * sizeof(void **);
@@ -9403,6 +9687,7 @@ void __init sched_init(void)
rq->calc_load_active = 0;
rq->calc_load_update = jiffies + LOAD_FREQ;
init_cfs_rq(&rq->cfs, rq);
+ init_edf_rq(&rq->edf, rq);
init_rt_rq(&rq->rt, rq);
#ifdef CONFIG_FAIR_GROUP_SCHED
init_task_group.shares = init_task_group_load;
@@ -9450,6 +9735,13 @@ void __init sched_init(void)
#endif
#endif /* CONFIG_FAIR_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+#ifdef CONFIG_CGROUP_SCHED
+ init_tg_edf_entry(&init_task_group, &rq->edf,
+ NULL, i, 1, NULL);
+#endif
+#endif
+
rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;
#ifdef CONFIG_RT_GROUP_SCHED
INIT_LIST_HEAD(&rq->leaf_rt_rq_list);
@@ -9760,6 +10052,68 @@ static inline void unregister_fair_sched_group(struct task_group *tg, int cpu)
}
#endif /* CONFIG_FAIR_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+static void free_edf_sched_group(struct task_group *tg)
+{
+ kfree(tg->edf_rq);
+}
+
+int alloc_edf_sched_group(struct task_group *tg, struct task_group *parent)
+{
+ struct rq *rq;
+ int i;
+
+ tg->edf_rq = kzalloc(sizeof(struct edf_rq*) * nr_cpu_ids, GFP_KERNEL);
+ if (!tg->edf_rq)
+ return 0;
+
+ for_each_possible_cpu(i) {
+ rq = cpu_rq(i);
+ init_tg_edf_entry(tg, &rq->edf, NULL, i, 0, NULL);
+ }
+
+ return 1;
+}
+
+int sched_edf_can_attach(struct cgroup *cgrp, struct task_struct *tsk)
+{
+ struct task_group *tg = container_of(cgroup_subsys_state(cgrp,
+ cpu_cgroup_subsys_id),
+ struct task_group, css);
+ u64 tg_bw = tg->edf_bandwidth.bw;
+ u64 tsk_bw = tsk->edf.bw;
+
+ if (!edf_task(tsk))
+ return 1;
+
+ /*
+ * Check if the requested group has enough free space
+ * for this particular task.
+ */
+ if (tg_bw < tsk_bw + tg->edf_bandwidth.total_bw)
+ return 0;
+
+ return 1;
+}
+#else
+static inline void free_edf_sched_group(struct task_group *tg)
+{
+}
+
+static inline
+int alloc_edf_sched_group(struct task_group *tg, struct task_group *parent)
+{
+ return 1;
+}
+#endif
+static inline void register_edf_sched_group(struct task_group *tg, int cpu)
+{
+}
+
+static inline void unregister_edf_sched_group(struct task_group *tg, int cpu)
+{
+}
+
#ifdef CONFIG_RT_GROUP_SCHED
static void free_rt_sched_group(struct task_group *tg)
{
@@ -9852,6 +10206,7 @@ static inline void unregister_rt_sched_group(struct task_group *tg, int cpu)
static void free_sched_group(struct task_group *tg)
{
free_fair_sched_group(tg);
+ free_edf_sched_group(tg);
free_rt_sched_group(tg);
kfree(tg);
}
@@ -9870,12 +10225,16 @@ struct task_group *sched_create_group(struct task_group *parent)
if (!alloc_fair_sched_group(tg, parent))
goto err;

+ if (!alloc_edf_sched_group(tg, parent))
+ goto err;
+
if (!alloc_rt_sched_group(tg, parent))
goto err;

spin_lock_irqsave(&task_group_lock, flags);
for_each_possible_cpu(i) {
register_fair_sched_group(tg, i);
+ register_edf_sched_group(tg, i);
register_rt_sched_group(tg, i);
}
list_add_rcu(&tg->list, &task_groups);
@@ -9906,12 +10265,31 @@ void sched_destroy_group(struct task_group *tg)
{
unsigned long flags;
int i;
+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct task_group *parent = tg->parent;
+ u64 tg_bw = tg->edf_bandwidth.bw;

spin_lock_irqsave(&task_group_lock, flags);
+
+ /*
+ * If an EDF group goes away, its parent group
+ * (if any), ends up with some free bandwidth that
+ * it might use for other groups/tasks.
+ */
+ spin_lock(&parent->edf_bandwidth.lock);
+ if (tg_bw && parent)
+ parent->edf_bandwidth.total_bw -= tg_bw;
+ spin_unlock(&parent->edf_bandwidth.lock);
+#else
+ spin_lock_irqsave(&task_group_lock, flags);
+#endif
+
for_each_possible_cpu(i) {
unregister_fair_sched_group(tg, i);
+ unregister_edf_sched_group(tg, i);
unregister_rt_sched_group(tg, i);
}
+
list_del_rcu(&tg->list);
list_del_rcu(&tg->siblings);
spin_unlock_irqrestore(&task_group_lock, flags);
@@ -10057,14 +10435,6 @@ unsigned long sched_group_shares(struct task_group *tg)
*/
static DEFINE_MUTEX(rt_constraints_mutex);

-static unsigned long to_ratio(u64 period, u64 runtime)
-{
- if (runtime == RUNTIME_INF)
- return 1ULL << 20;
-
- return div64_u64(runtime << 20, period);
-}
-
/* Must be called with tasklist_lock held */
static inline int tg_has_rt_tasks(struct task_group *tg)
{
@@ -10368,9 +10738,15 @@ static int
cpu_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct task_struct *tsk)
{
+#if defined(CONFIG_RT_GROUP_SCHED) || defined(CONFIG_EDF_GROUP_SCHED)
#ifdef CONFIG_RT_GROUP_SCHED
if (!sched_rt_can_attach(cgroup_tg(cgrp), tsk))
return -EINVAL;
+#endif
+#ifdef CONFIG_EDF_GROUP_SCHED
+ if (!sched_edf_can_attach(cgrp, tsk))
+ return -EINVAL;
+#endif
#else
/* We don't support RT-tasks being in separate groups */
if (tsk->sched_class != &fair_sched_class)
@@ -10384,6 +10760,30 @@ static void
cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cont, struct task_struct *tsk)
{
+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct task_group *tg = container_of(cgroup_subsys_state(cgrp,
+ cpu_cgroup_subsys_id),
+ struct task_group, css);
+ struct task_group *old_tg = container_of(cgroup_subsys_state(old_cont,
+ cpu_cgroup_subsys_id),
+ struct task_group, css);
+ u64 tsk_bw = tsk->edf.bw;
+
+ /*
+ * An amount of bandwidth equal to the bandwidth of tsk
+ * is freed in the former group of tsk, and declared occupied
+ * in the new one.
+ */
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ tg->edf_bandwidth.total_bw += tsk_bw;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+
+ if (old_tg) {
+ spin_lock_irq(&old_tg->edf_bandwidth.lock);
+ old_tg->edf_bandwidth.total_bw -= tsk_bw;
+ spin_unlock_irq(&old_tg->edf_bandwidth.lock);
+ }
+#endif
sched_move_task(tsk);
}

@@ -10426,6 +10826,100 @@ static u64 cpu_rt_period_read_uint(struct cgroup *cgrp, struct cftype *cft)
}
#endif /* CONFIG_RT_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+/*
+ * We allow runtime_max > period, since it makes sense to
+ * assign more than 100% bandwidth to a group, if on an SMP
+ * machine.
+ */
+static int __edf_schedulable(struct task_group *tg,
+ u64 runtime_max, u64 period)
+{
+ struct task_group *parent = tg->parent;
+ u64 old_bw = tg->edf_bandwidth.bw;
+ u64 parent_bw = parent ? parent->edf_bandwidth.bw : 0;
+ u64 bw = period > 0 ? to_ratio(period, runtime_max) : 0;
+ int ret = 0;
+
+ if (bw < tg->edf_bandwidth.total_bw)
+ return -EINVAL;
+
+ if (parent) {
+ /*
+ * All we need to do is check if the parent group is able
+ * to accomodate the new bandwidth of this group or not,
+ * given its actual allocated bandwidth, e.g., to other
+ * groups or tasks.
+ */
+ if (parent_bw < parent->edf_bandwidth.total_bw -
+ old_bw + bw)
+ return -EINVAL;
+
+ spin_lock_irq(&parent->edf_bandwidth.lock);
+ parent->edf_bandwidth.total_bw -= old_bw;
+ parent->edf_bandwidth.total_bw += bw;
+ spin_unlock_irq(&parent->edf_bandwidth.lock);
+ }
+
+ if (!ret) {
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ tg->edf_bandwidth.runtime_max = runtime_max;
+ tg->edf_bandwidth.period = period;
+ tg->edf_bandwidth.bw = bw;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+ }
+
+ return ret;
+}
+
+static int cpu_edf_runtime_write_uint(struct cgroup *cgrp,
+ struct cftype *cftype,
+ u64 edf_runtime_us)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+
+ return __edf_schedulable(tg,
+ edf_runtime_us * NSEC_PER_USEC,
+ tg->edf_bandwidth.period);
+}
+
+static u64 cpu_edf_runtime_read_uint(struct cgroup *cgrp, struct cftype *cft)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+ u64 runtime;
+
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ runtime = tg->edf_bandwidth.runtime_max;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+ do_div(runtime, NSEC_PER_USEC);
+
+ return runtime;
+}
+
+static int cpu_edf_period_write_uint(struct cgroup *cgrp, struct cftype *cftype,
+ u64 edf_period_us)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+
+ return __edf_schedulable(tg,
+ tg->edf_bandwidth.runtime_max,
+ edf_period_us * NSEC_PER_USEC);
+}
+
+static u64 cpu_edf_period_read_uint(struct cgroup *cgrp, struct cftype *cft)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+ u64 period;
+
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ period = tg->edf_bandwidth.period;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+ do_div(period, NSEC_PER_USEC);
+
+ return period;
+}
+#endif
+
static struct cftype cpu_files[] = {
#ifdef CONFIG_FAIR_GROUP_SCHED
{
@@ -10446,6 +10940,19 @@ static struct cftype cpu_files[] = {
.write_u64 = cpu_rt_period_write_uint,
},
#endif
+#ifdef CONFIG_EDF_GROUP_SCHED
+ {
+ .name = "edf_runtime_us",
+ .read_u64 = cpu_edf_runtime_read_uint,
+ .write_u64 = cpu_edf_runtime_write_uint,
+ },
+ {
+ .name = "edf_period_us",
+ .read_u64 = cpu_edf_period_read_uint,
+ .write_u64 = cpu_edf_period_write_uint,
+ },
+#endif
+
};

static int cpu_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont)
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index efb8440..1252a43 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -146,7 +146,8 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
}

#if defined(CONFIG_CGROUP_SCHED) && \
- (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+ (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED) || \
+ defined(CONFIG_EDF_GROUP_SCHED))
static void task_group_path(struct task_group *tg, char *buf, int buflen)
{
/* may be NULL if the underlying cgroup isn't fully-created yet */
@@ -218,6 +219,38 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
#endif
}

+void print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq)
+{
+ s64 min_deadline = -1, max_deadline = -1;
+ struct rq *rq = &per_cpu(runqueues, cpu);
+ struct sched_edf_entity *last;
+ unsigned long flags;
+
+ SEQ_printf(m, "\nedf_rq[%d]:\n", cpu);
+
+ spin_lock_irqsave(&rq->lock, flags);
+ if (edf_rq->rb_leftmost)
+ min_deadline = (rb_entry(edf_rq->rb_leftmost,
+ struct sched_edf_entity,
+ rb_node))->deadline;
+ last = __pick_edf_last_entity(edf_rq);
+ if (last)
+ max_deadline = last->deadline;
+ spin_unlock_irqrestore(&rq->lock, flags);
+
+#define P(x) \
+ SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(x))
+#define PN(x) \
+ SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(x))
+
+ P(edf_rq->edf_nr_running);
+ PN(min_deadline);
+ PN(max_deadline);
+
+#undef PN
+#undef P
+}
+
void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq)
{
#if defined(CONFIG_CGROUP_SCHED) && defined(CONFIG_RT_GROUP_SCHED)
@@ -300,6 +333,7 @@ static void print_cpu(struct seq_file *m, int cpu)
#undef P
#endif
print_cfs_stats(m, cpu);
+ print_edf_stats(m, cpu);
print_rt_stats(m, cpu);

print_rq(m, rq, cpu);
diff --git a/kernel/sched_edf.c b/kernel/sched_edf.c
new file mode 100644
index 0000000..cdec433
--- /dev/null
+++ b/kernel/sched_edf.c
@@ -0,0 +1,550 @@
+/*
+ * Earliest Deadline Firsf (EDF) Scheduling Class (SCHED_EDF policy)
+ *
+ * Earliest Deadline First scheduling for periodic or sporadic tasks.
+ *
+ * An EDF task specifies:
+ * - a maximum runtime,
+ * - a period or minimum interarrival time.
+ * The scheduler, on its turn:
+ * - schedule the task according to its deadline d=t+period,
+ * - enforces the task to not overcame its bandwidth B=runtime_max/period.
+ *
+ * At the end of each instance of a periodic job (or sporadic activation),
+ * the task may inform the scheduler about such event, by calling
+ * sched_yield(), and then suspending until next activation time.
+ *
+ * The strategy used to confine each task inside its bandwidth reservation
+ * is the Constant Bandwidth Server (CBS) scheduling, a slight variation on
+ * EDF that makes this possible.
+ *
+ * Correct behavior, i.e., no EDF task misses its deadline, is only
+ * guaranteed if the parameters of all the EDF entities in the system are
+ * assigned so to result in a schedulable solution.
+ * However, thanks to bandwidth isolation, overruns and deadline misses
+ * remains local to a single task, and does not affect any other task in
+ * the system.
+ *
+ * Groups, if configured, have bandwidth as well, and it is enforced that
+ * the sum of the bandwidths of entities (tasks and groups) belonging to
+ * a group stays below its own bandwidth.
+ *
+ * Copyright (C) 2009 Dario Faggioli, Michael Trimarchi
+ */
+
+static const struct sched_class edf_sched_class;
+
+static inline struct task_struct *edf_task_of(struct sched_edf_entity *edf_se)
+{
+ return container_of(edf_se, struct task_struct, edf);
+}
+
+static inline struct rq *rq_of_edf_rq(struct edf_rq *edf_rq)
+{
+ return container_of(edf_rq, struct rq, edf);
+}
+
+static inline struct edf_rq *edf_rq_of_se(struct sched_edf_entity *edf_se)
+{
+ struct task_struct *p = edf_task_of(edf_se);
+ struct rq *rq = task_rq(p);
+
+ return &rq->edf;
+}
+
+static inline int edf_se_boosted(struct sched_edf_entity *edf_se)
+{
+ struct task_struct *p = edf_task_of(edf_se);
+
+ return p->prio != p->normal_prio;
+}
+
+static inline int on_edf_rq(struct sched_edf_entity *edf_se)
+{
+ return !RB_EMPTY_NODE(&edf_se->rb_node);
+}
+
+#define for_each_leaf_edf_rq(edf_rq, rq) \
+ for (edf_rq = &rq->edf; edf_rq; edf_rq = NULL)
+
+static inline int edf_time_before(u64 a, u64 b)
+{
+ return (s64)(a - b) < 0;
+}
+
+static inline u64 edf_max_deadline(u64 a, u64 b)
+{
+ s64 delta = (s64)(b - a);
+ if (delta > 0)
+ a = b;
+
+ return a;
+}
+
+static inline void __edf_new_to_running(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+
+ edf_se->edf_flags &= ~EDF_NEW;
+ edf_se->edf_flags |= EDF_RUNNING;
+
+ edf_se->deadline = rq->clock + edf_se->period;
+}
+
+static inline void __edf_clear_ended(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+
+ edf_se->edf_flags &= ~EDF_ENDED;
+
+ edf_se->runtime = edf_se->runtime > 0 ? 0 : edf_se->runtime;
+ if (edf_time_before(edf_se->deadline, rq->clock))
+ edf_se->deadline = rq->clock;
+}
+
+static inline void __edf_set_ended(struct sched_edf_entity *edf_se)
+{
+ edf_se->edf_flags |= EDF_ENDED;
+}
+
+static void enqueue_edf_entity(struct sched_edf_entity *edf_se);
+static void dequeue_edf_entity(struct sched_edf_entity *edf_se);
+
+static void update_edf_params(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+ u64 left, right;
+
+ BUG_ON(on_edf_rq(edf_se));
+
+ if (edf_se->edf_flags & EDF_NEW) {
+ __edf_new_to_running(edf_se);
+ return;
+ }
+ if (edf_se->edf_flags & EDF_ENDED) {
+ __edf_clear_ended(edf_se);
+ goto update;
+ }
+
+ /*
+ * Update the deadline to the current time only if:
+ * - the budget has been completely exhausted;
+ * - using the ramaining budget, with the current deadline, would
+ * make the task exceed its bandwidth;
+ * - the deadline itself is in the past.
+ *
+ * For the second condition to hold, we check if:
+ * runtime / (deadline - rq->clock) >= runtime_max / period
+ *
+ * Which basically says if, in the time left before the current
+ * deadline, the tasks overcome its expected runtime by using the
+ * residual budget (left and right are the two sides of the equation,
+ * after a bit of shuffling to use multiplications instead of
+ * divisions).
+ */
+ left = edf_se->runtime > 0 ? edf_se->period * edf_se->runtime : 0;
+ right = (edf_se->deadline - rq->clock) * edf_se->runtime_max;
+
+ if (edf_se->runtime < 0 || edf_time_before(right, left) ||
+ edf_time_before(edf_se->deadline, rq->clock)) {
+ edf_se->deadline = rq->clock;
+update:
+ /*
+ * Keep moving the deadline and replenishing runtime by the
+ * proper amount at least until the runtime becomes positive.
+ */
+ while (edf_se->runtime <= 0) {
+ edf_se->deadline += edf_se->period;
+ if (edf_se->runtime + edf_se->runtime_max < edf_se->runtime_max)
+ edf_se->runtime += edf_se->runtime_max;
+ else
+ edf_se->runtime = edf_se->runtime_max;
+ }
+ }
+}
+
+static int start_edf_timer(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+ ktime_t now, delta, act;
+ ktime_t soft, hard;
+ unsigned long range;
+
+ now = hrtimer_cb_get_time(&edf_se->edf_timer);
+ delta = ktime_sub_ns(now, rq->clock);
+ act = ns_to_ktime(edf_se->deadline);
+ act = ktime_add(act, delta);
+
+ hrtimer_set_expires(&edf_se->edf_timer, act);
+
+ soft = hrtimer_get_softexpires(&edf_se->edf_timer);
+ hard = hrtimer_get_expires(&edf_se->edf_timer);
+ range = ktime_to_ns(ktime_sub(hard, soft));
+ __hrtimer_start_range_ns(&edf_se->edf_timer, soft,
+ range, HRTIMER_MODE_ABS, 0);
+
+ return hrtimer_active(&edf_se->edf_timer);
+}
+
+static enum hrtimer_restart edf_timer(struct hrtimer *timer)
+{
+ struct sched_edf_entity *edf_se = container_of(timer,
+ struct sched_edf_entity,
+ edf_timer);
+ struct task_struct *p = edf_task_of(edf_se);
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+
+ spin_lock(&rq->lock);
+ if (!edf_task(p))
+ goto unlock;
+
+ update_edf_params(edf_se);
+ if (p->se.on_rq && !on_edf_rq(edf_se)) {
+ enqueue_edf_entity(edf_se);
+ resched_task(rq->curr);
+ }
+unlock:
+ spin_unlock(&rq->lock);
+
+ return HRTIMER_NORESTART;
+}
+
+static void init_edf_timer(struct hrtimer *timer)
+{
+ if (hrtimer_active(timer)) {
+ hrtimer_try_to_cancel(timer);
+ return;
+ }
+
+ hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ timer->function = edf_timer;
+}
+
+static int edf_runtime_exceeded(struct sched_edf_entity *edf_se)
+{
+ if (edf_se->runtime > 0 || edf_se_boosted(edf_se))
+ return 0;
+
+ BUG_ON(hrtimer_active(&edf_se->edf_timer));
+
+ dequeue_edf_entity(edf_se);
+ if (!start_edf_timer(edf_se)) {
+ update_edf_params(edf_se);
+ enqueue_edf_entity(edf_se);
+ }
+
+ return 1;
+}
+
+static void update_curr_edf(struct rq *rq)
+{
+ struct task_struct *curr = rq->curr;
+ struct sched_edf_entity *edf_se = &curr->edf;
+ u64 delta_exec;
+
+ if (!edf_task(curr) || !on_edf_rq(edf_se))
+ return;
+
+ delta_exec = rq->clock - curr->se.exec_start;
+ if (unlikely((s64)delta_exec < 0))
+ delta_exec = 0;
+
+ schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec));
+
+ curr->se.sum_exec_runtime += delta_exec;
+ account_group_exec_runtime(curr, delta_exec);
+
+ curr->se.exec_start = rq->clock;
+ cpuacct_charge(curr, delta_exec);
+
+ edf_se->runtime -= delta_exec;
+ /*
+ * If we exhausted our runtime it has to be replenished
+ * either immediately or after one (or more) CBS period(s).
+ */
+ if (edf_runtime_exceeded(edf_se))
+ resched_task(curr);
+}
+
+static void enqueue_edf_entity(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rb_node **link = &edf_rq->rb_root.rb_node;
+ struct rb_node *parent = NULL;
+ struct sched_edf_entity *entry;
+ int leftmost = 1;
+
+ BUG_ON(!RB_EMPTY_NODE(&edf_se->rb_node));
+
+ while (*link) {
+ parent = *link;
+ entry = rb_entry(parent, struct sched_edf_entity, rb_node);
+ if (!edf_time_before(entry->deadline, edf_se->deadline))
+ link = &parent->rb_left;
+ else {
+ link = &parent->rb_right;
+ leftmost = 0;
+ }
+ }
+
+ if (leftmost)
+ edf_rq->rb_leftmost = &edf_se->rb_node;
+
+ rb_link_node(&edf_se->rb_node, parent, link);
+ rb_insert_color(&edf_se->rb_node, &edf_rq->rb_root);
+
+ edf_rq->edf_nr_running++;
+}
+
+static void dequeue_edf_entity(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+
+ if (RB_EMPTY_NODE(&edf_se->rb_node))
+ return;
+
+ if (edf_rq->rb_leftmost == &edf_se->rb_node) {
+ struct rb_node *next_node;
+ struct sched_edf_entity *next;
+
+ next_node = rb_next(&edf_se->rb_node);
+ edf_rq->rb_leftmost = next_node;
+
+ if (next_node)
+ next = rb_entry(next_node, struct sched_edf_entity,
+ rb_node);
+ }
+
+ rb_erase(&edf_se->rb_node, &edf_rq->rb_root);
+ RB_CLEAR_NODE(&edf_se->rb_node);
+
+ edf_rq->edf_nr_running--;
+}
+
+static void check_preempt_curr_edf(struct rq *rq, struct task_struct *p,
+ int sync)
+{
+ if (unlikely(rt_prio(p->prio)))
+ goto resched;
+
+ if (unlikely(p->sched_class != &edf_sched_class))
+ return;
+
+ if ((s64)p->edf.deadline - rq->curr->edf.deadline < 0)
+resched:
+ resched_task(rq->curr);
+}
+
+static void
+enqueue_task_edf(struct rq *rq, struct task_struct *p, int wakeup)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ BUG_ON(on_edf_rq(edf_se));
+
+ /*
+ * Only enqueue entities with some remaining runtime.
+ */
+ if (edf_se->runtime <= 0)
+ return;
+
+ update_edf_params(edf_se);
+ enqueue_edf_entity(edf_se);
+ check_preempt_curr_edf(rq, p, 0);
+}
+
+static void
+dequeue_task_edf(struct rq *rq, struct task_struct *p, int sleep)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ if (!on_edf_rq(edf_se))
+ return;
+
+ update_curr_edf(rq);
+ dequeue_edf_entity(edf_se);
+}
+
+static void yield_task_edf(struct rq *rq)
+{
+ struct task_struct *p = rq->curr;
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ __edf_set_ended(edf_se);
+}
+
+#ifdef CONFIG_SCHED_HRTICK
+static void start_hrtick_edf(struct rq *rq, struct task_struct *p)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+ s64 delta;
+
+ delta = edf_se->runtime_max - edf_se->runtime;
+
+ if (delta > 10000)
+ hrtick_start(rq, delta);
+}
+#else
+static void start_hrtick_edf(struct rq *rq, struct task_struct *p)
+{
+}
+#endif
+
+static struct sched_edf_entity *__pick_edf_last_entity(struct edf_rq *edf_rq)
+{
+ struct rb_node *last = rb_last(&edf_rq->rb_root);
+
+ if (!last)
+ return NULL;
+
+ return rb_entry(last, struct sched_edf_entity, rb_node);
+}
+
+static struct sched_edf_entity *pick_next_edf_entity(struct rq *rq,
+ struct edf_rq *edf_rq)
+{
+ struct rb_node *left = edf_rq->rb_leftmost;
+
+ if (!left)
+ return NULL;
+
+ return rb_entry(left, struct sched_edf_entity, rb_node);
+}
+
+struct task_struct *pick_next_task_edf(struct rq *rq)
+{
+ struct sched_edf_entity *edf_se;
+ struct edf_rq *edf_rq = &rq->edf;
+ struct task_struct *p;
+
+ BUG_ON(edf_rq->edf_nr_running < 0);
+
+ if (!edf_rq->edf_nr_running)
+ return NULL;
+
+ edf_se = pick_next_edf_entity(rq, edf_rq);
+ BUG_ON(!edf_se);
+
+ p = edf_task_of(edf_se);
+ p->se.exec_start = rq->clock;
+ if (hrtick_enabled(rq))
+ start_hrtick_edf(rq, p);
+
+ return p;
+}
+
+static void put_prev_task_edf(struct rq *rq, struct task_struct *p)
+{
+ update_curr_edf(rq);
+ p->se.exec_start = 0;
+}
+
+static void task_tick_edf(struct rq *rq, struct task_struct *p, int queued)
+{
+ update_curr_edf(rq);
+}
+
+static void set_curr_task_edf(struct rq *rq)
+{
+ struct task_struct *p = rq->curr;
+
+ p->se.exec_start = rq->clock;
+}
+
+static void prio_changed_edf(struct rq *rq, struct task_struct *p,
+ int oldprio, int running)
+{
+ if (running) {
+ if (p->prio > oldprio)
+ resched_task(rq->curr);
+ } else
+ check_preempt_curr_edf(rq, p, 0);
+}
+
+static void switched_to_edf(struct rq *rq, struct task_struct *p,
+ int running)
+{
+ if (!running)
+ check_preempt_curr_edf(rq, p, 0);
+}
+
+#ifdef CONFIG_SMP
+static int select_task_rq_edf(struct task_struct *p, int sd_flag, int flags)
+{
+ return task_cpu(p);
+}
+
+static unsigned long
+load_balance_edf(struct rq *this_rq, int this_cpu, struct rq *busiest,
+ unsigned long max_load_move,
+ struct sched_domain *sd, enum cpu_idle_type idle,
+ int *all_pinned, int *this_best_prio)
+{
+ /* for now, don't touch EDF tasks */
+ return 0;
+}
+
+static int
+move_one_task_edf(struct rq *this_rq, int this_cpu, struct rq *busiest,
+ struct sched_domain *sd, enum cpu_idle_type idle)
+{
+ return 0;
+}
+
+static void set_cpus_allowed_edf(struct task_struct *p,
+ const struct cpumask *new_mask)
+{
+ int weight = cpumask_weight(new_mask);
+
+ BUG_ON(!edf_task(p));
+
+ cpumask_copy(&p->cpus_allowed, new_mask);
+ p->edf.nr_cpus_allowed = weight;
+}
+#endif
+
+static const struct sched_class edf_sched_class = {
+ .next = &fair_sched_class,
+ .enqueue_task = enqueue_task_edf,
+ .dequeue_task = dequeue_task_edf,
+ .yield_task = yield_task_edf,
+
+ .check_preempt_curr = check_preempt_curr_edf,
+
+ .pick_next_task = pick_next_task_edf,
+ .put_prev_task = put_prev_task_edf,
+
+#ifdef CONFIG_SMP
+ .select_task_rq = select_task_rq_edf,
+
+ .load_balance = load_balance_edf,
+ .move_one_task = move_one_task_edf,
+ .set_cpus_allowed = set_cpus_allowed_edf,
+#endif
+
+ .set_curr_task = set_curr_task_edf,
+ .task_tick = task_tick_edf,
+
+ .prio_changed = prio_changed_edf,
+ .switched_to = switched_to_edf,
+};
+
+#ifdef CONFIG_SCHED_DEBUG
+void print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq);
+
+static void print_edf_stats(struct seq_file *m, int cpu)
+{
+ struct edf_rq *edf_rq = &cpu_rq(cpu)->edf;
+
+ rcu_read_lock();
+ for_each_leaf_edf_rq(edf_rq, cpu_rq(cpu))
+ print_edf_rq(m, cpu, edf_rq);
+ rcu_read_unlock();
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index ecc637a..cae7fb8 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1571,10 +1571,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_

update_curr(cfs_rq);

- if (unlikely(rt_prio(p->prio))) {
+ if (rt_prio(p->prio) || task_has_edf_policy(p))
resched_task(curr);
- return;
- }

if (unlikely(p->sched_class != &fair_sched_class))
return;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index a4d790c..4511fc9 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1746,7 +1746,7 @@ unsigned int get_rr_interval_rt(struct task_struct *task)
}

static const struct sched_class rt_sched_class = {
- .next = &fair_sched_class,
+ .next = &edf_sched_class,
.enqueue_task = enqueue_task_rt,
.dequeue_task = dequeue_task_rt,
.yield_task = yield_task_rt,
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 26185d7..d32aaf0 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -24,6 +24,7 @@ static int __read_mostly tracer_enabled;

static struct task_struct *wakeup_task;
static int wakeup_cpu;
+static int wakeup_edf;
static int wakeup_current_cpu;
static unsigned wakeup_prio = -1;
static int wakeup_rt;
@@ -214,6 +215,9 @@ probe_wakeup(struct rq *rq, struct task_struct *p, int success)
tracing_record_cmdline(p);
tracing_record_cmdline(current);

+ if (wakeup_edf && !edf_task(p))
+ return;
+
if ((wakeup_rt && !rt_task(p)) ||
p->prio >= wakeup_prio ||
p->prio >= current->prio)
@@ -340,12 +344,21 @@ static int __wakeup_tracer_init(struct trace_array *tr)

static int wakeup_tracer_init(struct trace_array *tr)
{
+ wakeup_edf = 0;
+ wakeup_rt = 0;
+ return __wakeup_tracer_init(tr);
+}
+
+static int wakeup_edf_tracer_init(struct trace_array *tr)
+{
+ wakeup_edf = 1;
wakeup_rt = 0;
return __wakeup_tracer_init(tr);
}

static int wakeup_rt_tracer_init(struct trace_array *tr)
{
+ wakeup_edf = 0;
wakeup_rt = 1;
return __wakeup_tracer_init(tr);
}
@@ -384,6 +397,20 @@ static struct tracer wakeup_tracer __read_mostly =
#endif
};

+static struct tracer wakeup_edf_tracer __read_mostly =
+{
+ .name = "wakeup_edf",
+ .init = wakeup_edf_tracer_init,
+ .reset = wakeup_tracer_reset,
+ .start = wakeup_tracer_start,
+ .stop = wakeup_tracer_stop,
+ .wait_pipe = poll_wait_pipe,
+ .print_max = 1,
+#ifdef CONFIG_FTRACE_SELFTEST
+ .selftest = trace_selftest_startup_wakeup,
+#endif
+};
+
static struct tracer wakeup_rt_tracer __read_mostly =
{
.name = "wakeup_rt",
@@ -406,6 +433,10 @@ __init static int init_wakeup_tracer(void)
if (ret)
return ret;

+ ret = register_tracer(&wakeup_edf_tracer);
+ if (ret)
+ return ret;
+
ret = register_tracer(&wakeup_rt_tracer);
if (ret)
return ret;

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-22 11:05:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 12:30 +0200, Raistlin wrote:

> Some notes about the implementation, on which we are open to comments
> and suggestions:
>
> * It implements the well-known Earliest Deadline First (EDF) algorithm

Might be good to mention here that the current implementation is fully
partitioned, but you're striving to make it global-ish.

> * It does not make any restrictive assumption about the characteristics
> of the tasks: it can handle periodic, sporadic or aperiodic tasks
> (see below)

sched_param_ex seems to only contain two of the three sporadic task
model parameters ;-)

> * We decided to put the new scheduling class in between sched_rt and
> sched_fair, for the following reasons:

I'd argue to put it above rt, since having it below renders any
guarantees void, since you have no idea about how much time is actually
available.

> - Backward compatibility and (POSIX) standard compliance require
> SCHED_FIFO/SCHED_RR tasks to run as soon as they can

In short, sod POSIX.

Simply define 'as soon as they can' to exclude EDF time ;-)

> - Since EDF is more efficient than fixed-priority in terms of
> schedulability, it will be easier for it to achieve high
> utilization even if it runs in the idle time of fixed-priority,
> than vice-versa
>
> - Real-time analysis techniques already exist to deal with exactly
> such scheduling architecture
>
> - The amount of interference coming from system SCHED_FIFO/SCHED_RR
> tasks can be accounted as system overhead
>
> - In recent kernels, the bandwidth used by the whole sched_rt
> scheduling class can be forced to stay below a certain threshold

The only two tasks that need to be the absolute highest priority are
kstopmachine and migrate, so those would need to be lifted above EDF,
the rest is not important :-)

> * It works natively on multicore platform. We have one runqueue for
> each CPU, implemented with a red-black tree, for efficient
> manipulation.
>
> * We will migrate the tasks among the runqueues trying to always have,
> on a system with m CPUs, the m earliest deadline ready tasks running
> on the CPUs.
> This is not present in the attached patch, but will be available
> soon.

Ah, ok, so now its fully paritioned, but you'll add that push-pull logic
to make it 'global-ish'.

That should be done at the root_domain level, so that cpusets still work
as expected.

> * Tasks are not migrated if a specific CPU affinity has been set.
> Affinity can be set using existing Linux mechanisms (i.e.,
> sched_setaffinity() for tasks and cpuset.cpus for task groups). Thus,
> it is sufficient that each EDF task in the system has its affinity
> set, to achieve a partitioned scheduling with very few overhead.
>
> * The API for tasks uses an additional system call, called
> sched_setscheduler_ex(...), to assign and modify the scheduling
> parameters described above (i.e., sched_runtime and sched_period).
> The existing system call sched_setscheduler() has not been extended,
> because of the binary compatibility issues that modifying its struct
> sched_param parameter would have raised for existing applications. So
> we implemented an additional syscall. The system call has the
> following prototype:
>
> #define SCHED_EDF 6
>
> struct sched_param_ex {
> int sched_priority; /* Backward compliance */
> struct timespec sched_edf_period;
> struct timespec sched_edf_runtime;
> };
>
> int sched_setscheduler_ex (pid_t pid, int policy, struct sched_param_ex *param);
>
> for the sake of consistency, also sched_setaparam_ex() and
> sched_getparam_ex() have been implemented.

There should also be a sys_sched_getscheduler_ex(), the get part of the
existing interface is what stops us from extending the sched_param bits
we have.

> * We did not add any further system call to allow periodic tasks to
> specify the end of the current instance, and the semantics of the
> existing sched_yield() system call has been extended for this
> purpose.

OK, so a wakeup after a yield, before a period elapses also raises some
warning/exception/signal thing?

> * It is integrated with the cgroups filesystem, so it's possible to
> assign a reservation to a group of tasks and make hierarchical
> scheduling.
> The cgroups interface provides two entries, cpu.edf_runtime_us and
> cpu.edf_period_us. These files are created once the cgroup filesystem
> is mounted, to get and set the group bandwidth.

Nice.

> * In case a SCHED_EDF tasks forks, parent's budget is split among
> parent and child.

Right, I guess there's really nothing else you can do here...


Haven't looked at the code yet,. hope to do so shortly.

Thanks!

2009-09-22 12:58:37

by Claudio Scordino

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class


> This implementation is part of a FP7 European project called ACTORS,
> and financially supported by the European commission
> (see http://www.actors-project.eu).
>
> The involved partners (which include Ericsson Research, Evidence Srl,
> AKAtech) strongly believe that a general-purpose operating system like
> Linux should provide a standard real-time scheduling policy (like the
> one implemented in this patch) still allowing to schedule non-real-time
> tasks in the usual way.

Hi Peter, hi all,

note that the request of having an EDF scheduling class integrated
in the mainline kernel does not come from academy, but comes from industry.

In particular, at least the following companies are interested in this
feature:

1. ERICSSON AB (http://www.ericsson.com)

2. Evidence Srl (http://www.evidence.eu.com): who implemented the code

3. AKAtech (http://www.akatech.ch)

4. Qprel (http://www.qprel.com)

5. Tertium technology (http://www.tertiumtechnology.com)

6. M2Tech (http://www.m2tech.biz)


Then, there are _also_ some universities and research centers:

1. Scuola Sant'Anna, Italy

2. University of Pisa, Italy

3. Lunds Universitet, Lund, Sweden

4. Technical University Kaiserslautern, Germany

5. Ecole Polytechnique Federale de Lausanne, Swiss

But I guess there are other companies and institutions interested, out
there...

Cheers,

Claudio

2009-09-22 12:38:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 13:58 +0200, Claudio Scordino wrote:
> > This implementation is part of a FP7 European project called ACTORS,
> > and financially supported by the European commission
> > (see http://www.actors-project.eu).
> >
> > The involved partners (which include Ericsson Research, Evidence Srl,
> > AKAtech) strongly believe that a general-purpose operating system like
> > Linux should provide a standard real-time scheduling policy (like the
> > one implemented in this patch) still allowing to schedule non-real-time
> > tasks in the usual way.
>
> Hi Peter, hi all,
>
> note that the request of having an EDF scheduling class integrated
> in the mainline kernel does not come from academy, but comes from industry.

Yeah, I know, there's lots of interest in having a deadline scheduler.

One of the things we'll have to look at is removing all the EDF
assumptions from it.

I think we'll be wanting to do something like s/EDF/DEADLINE/
s/edf/deadline/ etc. on the code base to begin with.

I think the task model as exposed through sched_param_ex needs a little
more thought as well.

But anyway, I'm very glad to see the code.. and am hoping to have some
time myself to prod at the PI code to work with this.

2009-09-22 12:51:09

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

Hi again

and first of all, thanks a lot for the quick reply!

On Tue, 2009-09-22 at 13:05 +0200, Peter Zijlstra wrote:
> > * It does not make any restrictive assumption about the characteristics
> > of the tasks: it can handle periodic, sporadic or aperiodic tasks
> > (see below)
>
> sched_param_ex seems to only contain two of the three sporadic task
> model parameters ;-)
>
Yes, that is true. Fact is that, inside the kernel, we deal only with
the runtime and a period, which is also equal to the (relative!)
deadline.
In fact, the (absolute) deadline of a task is set by adding _that_
period to the old deadline value, or to current time.

The, let's say, proper task period is left to the userspace, i.e.,
suspending until the next period/sporadic activation is not done in the
kernel, it is up to the task.
This is the most flexible interface we have been able to design, while
trying to reduce the amount of information that has to be added to the
minimum... Different ideas are more than welcome. :-)

> > * We decided to put the new scheduling class in between sched_rt and
> > sched_fair, for the following reasons:
>
> I'd argue to put it above rt, since having it below renders any
> guarantees void, since you have no idea about how much time is actually
> available.
>
Well, I think it might be possible, if rt is bandwidth constrained, as
it is/it is becoming... Don't you?

> > - Backward compatibility and (POSIX) standard compliance require
> > SCHED_FIFO/SCHED_RR tasks to run as soon as they can
>
> In short, sod POSIX.
>
> Simply define 'as soon as they can' to exclude EDF time ;-)
>
Well, I definitely am not going to fight to death for POSIX
compliance! :-P

It's just that we've heard a lot of times sentences like "not break well
known/standard behaviours, especially if exported to userspace" and we
have not been brave enough to contradict this. :D

Apart from that, I truly think that --again, if rt is bandwidth
constrained-- having fixed priority above EDF may lead to a solution
able to provide both, the predictability of the former, and the
flexibility of the latter, as said below (original mail).
Anyway, this is not going to be an issue, since moving edf forward is
quite easy.

Damn! I would like to find some time to design and run a couple of
examples and experiments on both configurations, and see what it comes
out... If I ever do, I'll let you know.

> > - Since EDF is more efficient than fixed-priority in terms of
> > schedulability, it will be easier for it to achieve high
> > utilization even if it runs in the idle time of fixed-priority,
> > than vice-versa
> >
> > - Real-time analysis techniques already exist to deal with exactly
> > such scheduling architecture
> >
> > - The amount of interference coming from system SCHED_FIFO/SCHED_RR
> > tasks can be accounted as system overhead
> >
> > - In recent kernels, the bandwidth used by the whole sched_rt
> > scheduling class can be forced to stay below a certain threshold
>
> The only two tasks that need to be the absolute highest priority are
> kstopmachine and migrate, so those would need to be lifted above EDF,
> the rest is not important :-)
>
Ok, I think this could be done easily... Maybe also making them EDF with
very small deadline, but then we would have to think about a reasonable
budget... Or, obviously, we can simply use something like a "system"
scheduling class.

But now, isn't this bringing will bring some unknown amount of
interference which may disable again the guarantees for edf? Do you
think this could be the case?

> > * We will migrate the tasks among the runqueues trying to always have,
> > on a system with m CPUs, the m earliest deadline ready tasks running
> > on the CPUs.
> > This is not present in the attached patch, but will be available
> > soon.
>
> Ah, ok, so now its fully paritioned, but you'll add that push-pull logic
> to make it 'global-ish'.
>
Yes, consistently with Linux "distributed runqueue" scheduling approach.
For now, I'm just "porting" the push/pull rt logic to deadlines, inside
sched-edf.
This may be suboptimal and rise at least overhead, clock synchronization
and locking issue, but I hope, again, it could be a start... A (bad?)
solution to compare against, when designing better implementations, at
least.

> That should be done at the root_domain level, so that cpusets still work
> as expected.
>
Yeah, sure. Consider that this should be easy since, for this first
version, I'm just recycling almost 100% of the sched-rt approach (and
some code too!).

> > int sched_setscheduler_ex (pid_t pid, int policy, struct sched_param_ex *param);
> >
> > for the sake of consistency, also sched_setaparam_ex() and
> > sched_getparam_ex() have been implemented.
>
> There should also be a sys_sched_getscheduler_ex(), the get part of the
> existing interface is what stops us from extending the sched_param bits
> we have.
Right, I have getparam, I can easily add getscheduler. :-)

> > * We did not add any further system call to allow periodic tasks to
> > specify the end of the current instance, and the semantics of the
> > existing sched_yield() system call has been extended for this
> > purpose.
>
> OK, so a wakeup after a yield, before a period elapses also raises some
> warning/exception/signal thing?
>
I thought about that, and I finally concluded that this should be not
needed. In fact, when a task "arrives" with a new instance (which is
what happen if it yield-suspend-resume), its deadline is set to:

max(clock, old_deadline) + period

Thus, even if a task slept few than expected, it makes no benefit out of
it, since the deadline advances --at best-- from the old value, and it
is not preempting other tasks, if any, with earlier deadline.

Do you agree?

The rationale behind this is, again, we tried to avoid as much as we
can, was to introduce some need of the kernel to be aware of the task
sleeping/resuming behaviour, since this was judged to be too much
restrictive.

Another thing I would like to have, is a task receiving a SIGXCPU if it
misses its deadline, which might happen actually, e.g., if you load an
UP system/a CPU more than 100% with EDF tasks.

> > * In case a SCHED_EDF tasks forks, parent's budget is split among
> > parent and child.
>
> Right, I guess there's really nothing else you can do here...
>
Well, me too... But during tests I run into a poor EDF shell that, after
each `ls' or `cat', lost half of its bandwidth up to be no longer
capable of running at all! :(

We may avoid this having the son giving back its bandwidth to the father
when dieing (what a sad story! :( ) but this would need distinguishing
between fork-ed and setschedul-ed EDF tasks. Moreover, e.g., what if the
son changed its EDF bandwidth in the meanwhile? Or worse if it changed
its scheduling policy?

At the current time, I'm just splitting the bandwidth, and nothing more.
Actually, I also think the solution is the right one, but I would really
like to discuss the issues it raises.

> Haven't looked at the code yet,. hope to do so shortly.
>
No problem... I also hope to update it shortly! ;-)

Thanks again,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-22 13:24:35

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 12:30 +0200, Raistlin wrote:
> Hi Peter, Hi all,
>
> this is the implementation of a new scheduling class called
> SCHED_EDF made by me and Michael Trimarchi (CC:-ed). We followed the
> suggestions you gave us at the multicore workshop in Kaiserlsautern
> on October.
>
> For other people: it is a real-time scheduling policy based on the
> Earliest Deadline First (EDF) algorithm, one of the most common
> real-time scheduling algorithms.

You've got some type of encoding corruption in your patch,

+ else {
+ if (!edf_task(p))
+ p->sched_class =3D &fair_sched_class;
+ else
+ p->sched_class =3D &edf_sched_class;
+ }
=20
p->prio =3D prio;
=20

See the =20 , and =3D above.. I'm not sure exactly how that gets into
patches ;( We require utf8 encodings , some people can handle the stuff
above but most scripts and the patch command won't be able to. I realize
this is an RFC , so the next time you submit this try to make it utf8 ..

Daniel

2009-09-22 14:01:07

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 06:24 -0700, Daniel Walker wrote:
> You've got some type of encoding corruption in your patch,
>
Oops! Sorry about that... :(

> See the =20 , and =3D above.. I'm not sure exactly how that gets into
> patches ;( We require utf8 encodings , some people can handle the stuff
> above but most scripts and the patch command won't be able to.
>
I see. Problem is that last time I sent a patch, I inlined it and it
worked without any issue... It seems something has changed either in me
or my MUA... Too bad. :-(

> I realize
> this is an RFC , so the next time you submit this try to make it utf8 ..
>
You're definitely right and, now that you pointed this out to me, I'm
not able to produce a clean version, even if I set utf8 encoding...

So, I'm sorry, but, for now, the only thing I can do is attaching the
patch, hoping it works. The next one will be stop using Evolution! :-/

Regards and sorry again,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
edf.patch (53.36 kB)
signature.asc (197.00 B)
This is a digitally signed message part
Download all attachments

2009-09-22 14:02:54

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 16:01 +0200, Raistlin wrote:
> On Tue, 2009-09-22 at 06:24 -0700, Daniel Walker wrote:
> > You've got some type of encoding corruption in your patch,
> >
> Oops! Sorry about that... :(
>
> > See the =20 , and =3D above.. I'm not sure exactly how that gets into
> > patches ;( We require utf8 encodings , some people can handle the stuff
> > above but most scripts and the patch command won't be able to.
> >
> I see. Problem is that last time I sent a patch, I inlined it and it
> worked without any issue... It seems something has changed either in me
> or my MUA... Too bad. :-(
>
> > I realize
> > this is an RFC , so the next time you submit this try to make it utf8 ..
> >
> You're definitely right and, now that you pointed this out to me, I'm
> not able to produce a clean version, even if I set utf8 encoding...
>
> So, I'm sorry, but, for now, the only thing I can do is attaching the
> patch, hoping it works. The next one will be stop using Evolution! :-/

Your using git already , why not use git-format-patch and
git-send-email ?

Daniel

2009-09-22 16:39:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 06:24 -0700, Daniel Walker wrote:
> On Tue, 2009-09-22 at 12:30 +0200, Raistlin wrote:
> > Hi Peter, Hi all,
> >
> > this is the implementation of a new scheduling class called
> > SCHED_EDF made by me and Michael Trimarchi (CC:-ed). We followed the
> > suggestions you gave us at the multicore workshop in Kaiserlsautern
> > on October.
> >
> > For other people: it is a real-time scheduling policy based on the
> > Earliest Deadline First (EDF) algorithm, one of the most common
> > real-time scheduling algorithms.
>
> You've got some type of encoding corruption in your patch,
>
> + else {
> + if (!edf_task(p))
> + p->sched_class =3D &fair_sched_class;
> + else
> + p->sched_class =3D &edf_sched_class;
> + }
> =20
> p->prio =3D prio;
> =20
>
> See the =20 , and =3D above.. I'm not sure exactly how that gets into
> patches ;(

Daniel, welcome to my kill-file.

2009-09-22 16:42:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 16:01 +0200, Raistlin wrote:
> On Tue, 2009-09-22 at 06:24 -0700, Daniel Walker wrote:
> > You've got some type of encoding corruption in your patch,
> >
> Oops! Sorry about that... :(

Don't worry, they are fine.

2009-09-22 18:36:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 14:51 +0200, Raistlin wrote:

> On Tue, 2009-09-22 at 13:05 +0200, Peter Zijlstra wrote:
> > > * It does not make any restrictive assumption about the characteristics
> > > of the tasks: it can handle periodic, sporadic or aperiodic tasks
> > > (see below)
> >
> > sched_param_ex seems to only contain two of the three sporadic task
> > model parameters ;-)
> >
> Yes, that is true. Fact is that, inside the kernel, we deal only with
> the runtime and a period, which is also equal to the (relative!)
> deadline.
> In fact, the (absolute) deadline of a task is set by adding _that_
> period to the old deadline value, or to current time.
>
> The, let's say, proper task period is left to the userspace, i.e.,
> suspending until the next period/sporadic activation is not done in the
> kernel, it is up to the task.
> This is the most flexible interface we have been able to design, while
> trying to reduce the amount of information that has to be added to the
> minimum... Different ideas are more than welcome. :-)

No bright ideas at the moment, but it might be worth-while to describe
this task model somewhere near sched_param_ex :-)

> > > * We decided to put the new scheduling class in between sched_rt and
> > > sched_fair, for the following reasons:
> >
> > I'd argue to put it above rt, since having it below renders any
> > guarantees void, since you have no idea about how much time is actually
> > available.
> >
> Well, I think it might be possible, if rt is bandwidth constrained, as
> it is/it is becoming... Don't you?

Ah, the problem I see there is that your minimum deadline gets
constrained by whatever you configure your SCHED_FIFO bit to, this
doesn't seem ideal.

> > > - Backward compatibility and (POSIX) standard compliance require
> > > SCHED_FIFO/SCHED_RR tasks to run as soon as they can
> >
> > In short, sod POSIX.
> >
> > Simply define 'as soon as they can' to exclude EDF time ;-)
> >
> Well, I definitely am not going to fight to death for POSIX
> compliance! :-P
>
> It's just that we've heard a lot of times sentences like "not break well
> known/standard behaviours, especially if exported to userspace" and we
> have not been brave enough to contradict this. :D

Ah, very easy, since POSIX doesn't describe SCHED_EDF/DEADLINE any
system that does run such a task is per definition not POSIX compliant,
is it? ;-)

> Apart from that, I truly think that --again, if rt is bandwidth
> constrained-- having fixed priority above EDF may lead to a solution
> able to provide both, the predictability of the former, and the
> flexibility of the latter, as said below (original mail).
> Anyway, this is not going to be an issue, since moving edf forward is
> quite easy.

See the problem with the SCHED_FIFO period above.

> Damn! I would like to find some time to design and run a couple of
> examples and experiments on both configurations, and see what it comes
> out... If I ever do, I'll let you know.

:-)

> > The only two tasks that need to be the absolute highest priority are
> > kstopmachine and migrate, so those would need to be lifted above EDF,
> > the rest is not important :-)
> >
> Ok, I think this could be done easily... Maybe also making them EDF with
> very small deadline, but then we would have to think about a reasonable
> budget... Or, obviously, we can simply use something like a "system"
> scheduling class.
>
> But now, isn't this bringing will bring some unknown amount of
> interference which may disable again the guarantees for edf? Do you
> think this could be the case?

I would argue that kstopmachine is going to break pretty much
everything, so a workload that contains one (module unload, cpu-hotplug)
will not be analyzable. Although Ted Baker mentioned something about
modeling it at a system state change or somesuch.

I'd rather say that anything kstopmachine will violate RT guarantees.

As to migrate, I think we can model that as a regular non-preempt
section (we could actually remove the migrate thread and actually make
it such).

> > > * We will migrate the tasks among the runqueues trying to always have,
> > > on a system with m CPUs, the m earliest deadline ready tasks running
> > > on the CPUs.
> > > This is not present in the attached patch, but will be available
> > > soon.
> >
> > Ah, ok, so now its fully paritioned, but you'll add that push-pull logic
> > to make it 'global-ish'.
> >
> Yes, consistently with Linux "distributed runqueue" scheduling approach.
> For now, I'm just "porting" the push/pull rt logic to deadlines, inside
> sched-edf.
> This may be suboptimal and rise at least overhead, clock synchronization
> and locking issue, but I hope, again, it could be a start... A (bad?)
> solution to compare against, when designing better implementations, at
> least.

Agreed, its a pragmatic starting point, and only by implementing it can
we evaluate it.

> Right, I have getparam, I can easily add getscheduler. :-)

Ah, no, I got stuff mixed up, getparam is indeed the one!

> > > * We did not add any further system call to allow periodic tasks to
> > > specify the end of the current instance, and the semantics of the
> > > existing sched_yield() system call has been extended for this
> > > purpose.
> >
> > OK, so a wakeup after a yield, before a period elapses also raises some
> > warning/exception/signal thing?
> >
> I thought about that, and I finally concluded that this should be not
> needed. In fact, when a task "arrives" with a new instance (which is
> what happen if it yield-suspend-resume), its deadline is set to:
>
> max(clock, old_deadline) + period
>
> Thus, even if a task slept few than expected, it makes no benefit out of
> it, since the deadline advances --at best-- from the old value, and it
> is not preempting other tasks, if any, with earlier deadline.
>
> Do you agree?

I think I do :-)

> The rationale behind this is, again, we tried to avoid as much as we
> can, was to introduce some need of the kernel to be aware of the task
> sleeping/resuming behaviour, since this was judged to be too much
> restrictive.

Makes sense.

> Another thing I would like to have, is a task receiving a SIGXCPU if it
> misses its deadline, which might happen actually, e.g., if you load an
> UP system/a CPU more than 100% with EDF tasks.

Missing a deadline, and possibly over-running a deadline too.

Maybe add a flags field to the sched_param_ex thing ;-)

> > > * In case a SCHED_EDF tasks forks, parent's budget is split among
> > > parent and child.
> >
> > Right, I guess there's really nothing else you can do here...
> >
> Well, me too... But during tests I run into a poor EDF shell that, after
> each `ls' or `cat', lost half of its bandwidth up to be no longer
> capable of running at all! :(
>
> We may avoid this having the son giving back its bandwidth to the father
> when dieing (what a sad story! :( ) but this would need distinguishing
> between fork-ed and setschedul-ed EDF tasks. Moreover, e.g., what if the
> son changed its EDF bandwidth in the meanwhile? Or worse if it changed
> its scheduling policy?
>
> At the current time, I'm just splitting the bandwidth, and nothing more.
> Actually, I also think the solution is the right one, but I would really
> like to discuss the issues it raises.

Ooh, good point,.. yes we can put some exit hooks in there folding the
runtime back.

An alternative is starting the child out with 0 runtime, and have the
parent run sched_setscheduler() on it giving us a clear point to run
admission on.

2009-09-22 19:11:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class


* Peter Zijlstra <[email protected]> wrote:

> On Tue, 2009-09-22 at 16:01 +0200, Raistlin wrote:
> > On Tue, 2009-09-22 at 06:24 -0700, Daniel Walker wrote:
> > > You've got some type of encoding corruption in your patch,
> > >
> > Oops! Sorry about that... :(
>
> Don't worry, they are fine.

Yes - i was able to save and apply the patch just fine too:

earth4:~/tip> q push
Applying patch patches/sched_edf-scheduling-class.patch
patching file arch/arm/include/asm/unistd.h
patching file arch/arm/kernel/calls.S
patching file arch/x86/ia32/ia32entry.S
patching file arch/x86/include/asm/unistd_32.h
patching file arch/x86/include/asm/unistd_64.h
patching file arch/x86/kernel/syscall_table_32.S
patching file include/linux/sched.h
Hunk #3 succeeded at 165 (offset 4 lines).
Hunk #5 succeeded at 1205 (offset 23 lines).
Hunk #7 succeeded at 1609 (offset 22 lines).
patching file include/linux/syscalls.h
patching file init/Kconfig
patching file kernel/fork.c
Hunk #1 succeeded at 1223 (offset 20 lines).
patching file kernel/sched.c
Hunk #14 succeeded at 6099 (offset 12 lines).
Hunk #16 succeeded at 6287 (offset 12 lines).
Hunk #18 succeeded at 6339 (offset 12 lines).
Hunk #20 succeeded at 6376 (offset 12 lines).
Hunk #22 succeeded at 6486 (offset 12 lines).
Hunk #24 succeeded at 6540 (offset 12 lines).
Hunk #26 succeeded at 6609 (offset 12 lines).
Hunk #28 succeeded at 6704 (offset 12 lines).
Hunk #30 succeeded at 9547 (offset 12 lines).
Hunk #32 succeeded at 9636 (offset 12 lines).
Hunk #34 succeeded at 9747 (offset 12 lines).
Hunk #36 succeeded at 10218 (offset 12 lines).
Hunk #38 succeeded at 10277 (offset 12 lines).
Hunk #40 succeeded at 10750 (offset 12 lines).
Hunk #42 succeeded at 10838 (offset 12 lines).
patching file kernel/sched_debug.c
patching file kernel/sched_edf.c
patching file kernel/sched_fair.c
patching file kernel/sched_rt.c
patching file kernel/trace/trace_sched_wakeup.c

Now at patch patches/sched_edf-scheduling-class.patch

Daniel is a kind of self-elected joker-bot on lkml, running checkpatch
out of .procmailrc and annoying people with trivial comments,
distracting from the real discussion. Raistlin, feel free to ignore him
in the future.

Ingo

2009-09-22 20:55:27

by Linus Walleij

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

Hi Raistlin,

it's great fun to see this scheduler materialize, I was present at the
workshop in Kaiserslautern and waited for the code since. What I'm
interested in getting a briefing on right now is what practical use EDF
schedulers have, since I think this is what many people will ask for.

I think actually the answers are the same as when I asked at Meld
(of all places) what HRtimers are actually for. And I I got the non-all
inclusive answer that they're for Flight simulators.
Which is obviously in the domain of automatic control.
http://en.wikipedia.org/wiki/Automatic_control

So one application of HRTimers has been to trigger events in
userspace programs, especially when your dealing with issues
related to automatic control. Automatic control is also
mentioned throughout this patch. (Yes, I know HRtimers
have other great uses also, especially in-kernel, those will
remain.)

I am very interested in if you or someone else
could elaborate on the relation between HRtimers and deadline
scheduling. To my untrained mind it looks like HRtimers will
fire events to your task in a very precise manner, but
you cannot currently guarantee that they will then proceed to
process their target task in time (meet a deadline).

I'm under the impression that some people currently use
some periodic HRtimers (through either the timerfd_create system
call I guess, or POSIX timer_create(CLOCK_REALTIME_HR...)
timers combined with some SHED_FIFO or so to achieve the same
thing a deadline scheduler would do, but since SCHED_FIFO cannot
really guarantee that this will work, you simply underutilize
the system a bit, add the CONFIG_PREEMPT_RT patch so the
timers are not blocked by things like slow interrupthandlers or
starvation (priority inversion) and then hope for the best. It turns
out to work. You measure on the system and it has the desired
"hard realtime" characteristics.

Do people do such things? I haven't seen those applications,
they all tend to be closed-source really. I would assume the
vxWorks/pSOS/etc compatibility wrapper libraries do things
like this, do they not?

But let us assume these applications really exist. tglx will probably
tell if I'm totally off track here.

So from here it looks like a problem that was previously solved by
fireing a periodic HRtimer is now better suited to be solved
by scheduling it with a periodic deadlined scheduling policy.
It will be done better, easier and with a guaranteed result.
Is this reasoning generally speaking correct?

Now since that example up there included the RTOS replacement
case, that means the list of possible applications utilizing these very
responsive control systems would be the same as any good old
so-called RTOS and include user-space applications for:

* Any interactive simulations and games
* Industrial automation including robotics
* Process industry applications (e.g. chemical processing, power
plants, paper/car/textile/etc manufacturing)
* Various weapons including missiles
* Medical equipment
* The above will sooner or later include a dataflow which
need some processing on it which leads to thinking of
data (token) flows leading up to:
* Massive parallelization of FunnyStuff like the models found
in OpenDF (also part of the ACTORS project):
http://opendf.svn.sourceforge.net/viewvc/opendf/trunk/models/
here (correct me if I'm wrong) the deadline scheduling
will be used to construct timewise predictable flows in massively
parallel threadpools distributed over several processing nodes etc.
Illustrations include MPEG4 decoding, FIR, FFT so let's
say real-time constrained signal processing on some
time-critical dataflow.
(Don't know the deadline-critical parts of the Mandelbrot
set or the Game of Life though, hehe.)

This sounds a lot like a nail in the coffin for the RTOS and all
its virtualized variants (running Linux inside an RTOS etc).

(Please excuse my bastardized condensed-form popular science
version of the noble goals of the ACTORS project, I only observed
it at a distance so my vision was dimmed.)

Linus Walleij

2009-09-22 23:38:58

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 22 Sep 2009 06:24:38 -0700
Daniel Walker <[email protected]> wrote:

> You've got some type of encoding corruption in your patch,
>
> + else {
> + if (!edf_task(p))
> + p->sched_class =3D &fair_sched_class;
> + else
> + p->sched_class =3D &edf_sched_class;
> + }
> =20
> p->prio =3D prio;
> =20

I could swear we talked about quoted-printable before. You really need
a new mailer. Even evolution (which you seem to be using) should
really be able to handle this.

jon

2009-09-22 23:55:47

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 17:39 -0600, Jonathan Corbet wrote:
> On Tue, 22 Sep 2009 06:24:38 -0700
> Daniel Walker <[email protected]> wrote:
>
> > You've got some type of encoding corruption in your patch,
> >
> > + else {
> > + if (!edf_task(p))
> > + p->sched_class =3D &fair_sched_class;
> > + else
> > + p->sched_class =3D &edf_sched_class;
> > + }
> > =20
> > p->prio =3D prio;
> > =20
>
> I could swear we talked about quoted-printable before. You really need
> a new mailer. Even evolution (which you seem to be using) should
> really be able to handle this.

We did .. It's not my mailer, it's my scripts .. I read a little of
wikipedia on this and it says it's a MIME encoding..

http://en.wikipedia.org/wiki/Quoted-printable

So it's like a mime encoded patch instead of plain ascii encoded
patch .. I don't think we should all have to parse this stuff out, we
should just be getting utf8 encodings which covers everything we use in
the kernel AFAIK ..

Daniel

2009-09-23 00:05:51

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 22 Sep 2009 16:55:39 -0700
Daniel Walker <[email protected]> wrote:

> We did .. It's not my mailer, it's my scripts .. I read a little of
> wikipedia on this and it says it's a MIME encoding..
>
> http://en.wikipedia.org/wiki/Quoted-printable
>
> So it's like a mime encoded patch instead of plain ascii encoded
> patch .. I don't think we should all have to parse this stuff out, we
> should just be getting utf8 encodings which covers everything we use in
> the kernel AFAIK ..

I have scripts that handle email as well. But I wrote them in a
scripting language which has modules to cope with standard-compliant
email, and I have no trouble with other transfer encodings at all. You
really need to do the same. Better you fix your scripts than
continually demanding that the rest of the development community
conform to your restrictive requirements.

Daniel, I believe that you are really trying to help make the kernel
better. But I fear that you may be having the opposite effect by
discouraging contributions from others. Please consider taking a step
back and thinking about your interactions and the nature of your
efforts; I think you may find that there are more productive and
encouraging ways for you to contribute.

Thanks,

jon

2009-09-23 00:40:27

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 18:06 -0600, Jonathan Corbet wrote:
> On Tue, 22 Sep 2009 16:55:39 -0700
> Daniel Walker <[email protected]> wrote:
>
> > We did .. It's not my mailer, it's my scripts .. I read a little of
> > wikipedia on this and it says it's a MIME encoding..
> >
> > http://en.wikipedia.org/wiki/Quoted-printable
> >
> > So it's like a mime encoded patch instead of plain ascii encoded
> > patch .. I don't think we should all have to parse this stuff out, we
> > should just be getting utf8 encodings which covers everything we use in
> > the kernel AFAIK ..
>
> I have scripts that handle email as well. But I wrote them in a
> scripting language which has modules to cope with standard-compliant
> email, and I have no trouble with other transfer encodings at all. You
> really need to do the same. Better you fix your scripts than
> continually demanding that the rest of the development community
> conform to your restrictive requirements.

"restrictive requirements" ? I mean don't blame me for that .. It's in
Documentation/email-clients.txt .. The selected norm is UTF8 , and I
don't disagree with that. After researching it that is what we should be
using. At the same time, I'm not demanding anything of anyone , I'm
simply asking nicely for the submitter to check what they are doing..

My scripts may not handle it which is fine. I just wouldn't likely end
up reviewing it. That really isn't bad for _me_ ..

> Daniel, I believe that you are really trying to help make the kernel
> better. But I fear that you may be having the opposite effect by
> discouraging contributions from others. Please consider taking a step
> back and thinking about your interactions and the nature of your
> efforts; I think you may find that there are more productive and
> encouraging ways for you to contribute.

I am trying to make the kernel (or community) better, but my motivation
is purely my own (as it often is with people on this list) ..

I don't see the effects your indicating, discouraging contributions .. I
think if anything I'm helping people to get to a place where they can
contribute in a comfortable way without worrying about things like
checkpatch .. I think the problem most submitters have on this list is
that no one tells them they are doing something wrong so they don't
correct it. I clearly recall submitting my first several patches
copy-and-pasted which got either ignored , or caused a maintainer lots
of pain .. Had I been told earlier that it was a problem I would have
corrected it immediately (and been happy to have been notified about
it)..

discouraging contributions is more something that happens when you get
the responses I got earlier in this thread..

Daniel

2009-09-23 00:51:42

by Daniel Walker

[permalink] [raw]
Subject: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Tue, 2009-09-22 at 21:11 +0200, Ingo Molnar wrote:

> Now at patch patches/sched_edf-scheduling-class.patch
>
> Daniel is a kind of self-elected joker-bot on lkml, running checkpatch
> out of .procmailrc and annoying people with trivial comments,
> distracting from the real discussion. Raistlin, feel free to ignore him
> in the future.

You want to talk about it constructively or not? I'll stop sending those
emails if it's actually negative , but I don't think it is..

Daniel

2009-09-23 01:01:47

by Joe Perches

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Tue, 2009-09-22 at 17:51 -0700, Daniel Walker wrote:
> I'll stop sending those
> emails if it's actually negative , but I don't think it is..

Hi Daniel.

I think it'd be more helpful if instead of merely sending
a "hah! checkpatch failure!" email you send the submitter
a gentler nudge and a fixed patch.

cheers, Joe

2009-09-23 01:11:39

by Daniel Walker

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Tue, 2009-09-22 at 18:01 -0700, Joe Perches wrote:
> On Tue, 2009-09-22 at 17:51 -0700, Daniel Walker wrote:
> > I'll stop sending those
> > emails if it's actually negative , but I don't think it is..
>
> Hi Daniel.
>
> I think it'd be more helpful if instead of merely sending
> a "hah! checkpatch failure!" email you send the submitter
> a gentler nudge and a fixed patch.

I don't say "hah!" like I caught you or something .. I could try to
soften the emails further so that's not the feeling .. As far as sending
fixed patches to people it's not really feasible due to the number of
patches I would have to send out.. The other issues is that if I send a
corrected patch I could become responsible for that persons patch
instead of them .. I could send out delta patches, however, I don't want
people to rely on me to fix these issues.. Ideally people would correct
them prior to sending the patches out..

Daniel

2009-09-23 07:03:09

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 21:11 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
> >
> > Don't worry, they are fine.
>
> Yes - i was able to save and apply the patch just fine too:
>
Ok, fine, thanks for the feedback.

> earth4:~/tip> q push
> Applying patch patches/sched_edf-scheduling-class.patch
> ...
> patching file kernel/sched_rt.c
> patching file kernel/trace/trace_sched_wakeup.c
>
> Now at patch patches/sched_edf-scheduling-class.patch
>
Cool :-)

> Daniel is a kind of self-elected joker-bot on lkml, running checkpatch
> out of .procmailrc and annoying people with trivial comments,
> distracting from the real discussion.
Ok. Anyway, it could happen --especially to me :P-- to mess something
up... So no problem with me if someone points this out if/when it
happens! :D

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-23 11:47:59

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On 09/23/2009 03:40 AM, Daniel Walker wrote:
> I am trying to make the kernel (or community) better, but my motivation
> is purely my own (as it often is with people on this list) ..
>
> I don't see the effects your indicating, discouraging contributions .. I
> think if anything I'm helping people to get to a place where they can
> contribute in a comfortable way without worrying about things like
> checkpatch .. I think the problem most submitters have on this list is
> that no one tells them they are doing something wrong so they don't
> correct it. I clearly recall submitting my first several patches
> copy-and-pasted which got either ignored , or caused a maintainer lots
> of pain .. Had I been told earlier that it was a problem I would have
> corrected it immediately (and been happy to have been notified about
> it)..
>
>

Look at the context. Here we have a new scheduling class introduced out
of the blue. The discussion should be around whether it makes sense; if
it does, whether the design is good or not; if it is, whether the code
is maintainable or not, if that passes, whether there are bugs in the
code or not. Until we've decided that Linux needs this scheduling class
and that this patchset is well designed and written, spelling errors are
just a nuisance and distract people from the important issues.

> discouraging contributions is more something that happens when you get
> the responses I got earlier in this thread..
>

That's probably intentional. Whitespace fixes have their place but not
at this stage in a patch's lifecycle.

--
error compiling committee.c: too many arguments to function

2009-09-23 12:19:53

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 20:36 +0200, Peter Zijlstra wrote:
> > The, let's say, proper task period is left to the userspace, i.e.,
> > suspending until the next period/sporadic activation is not done in the
> > kernel, it is up to the task.
> > This is the most flexible interface we have been able to design, while
> > trying to reduce the amount of information that has to be added to the
> > minimum... Different ideas are more than welcome. :-)
>
> No bright ideas at the moment, but it might be worth-while to describe
> this task model somewhere near sched_param_ex :-)
>
Well, when you have a good point... You have a good point! :-D

> > Well, I think it might be possible, if rt is bandwidth constrained, as
> > it is/it is becoming... Don't you?
>
> Ah, the problem I see there is that your minimum deadline gets
> constrained by whatever you configure your SCHED_FIFO bit to, this
> doesn't seem ideal.
>
Agree again, not the ideal behaviour... However, as said, I would like
to experiment some more on this, and to (re)read the papers where they
introduces the analysis for EDF-under-RM. :-)

However, I should start thinking moving it upward, should I?

> I would argue that kstopmachine is going to break pretty much
> everything, so a workload that contains one (module unload, cpu-hotplug)
> will not be analyzable.
>
Ok, I think most of the people can stand this! :-P

> I'd rather say that anything kstopmachine will violate RT guarantees.
>
> As to migrate, I think we can model that as a regular non-preempt
> section (we could actually remove the migrate thread and actually make
> it such).
>
I see... and do you think one more scheduling class would be needed, or
something like 0 deadline would do the trick?

> > This may be suboptimal and rise at least overhead, clock synchronization
> > and locking issue, but I hope, again, it could be a start... A (bad?)
> > solution to compare against, when designing better implementations, at
> > least.
>
> Agreed, its a pragmatic starting point, and only by implementing it can
> we evaluate it.
>
Hope it will be ready soon...

> > Another thing I would like to have, is a task receiving a SIGXCPU if it
> > misses its deadline, which might happen actually, e.g., if you load an
> > UP system/a CPU more than 100% with EDF tasks.
>
> Missing a deadline, and possibly over-running a deadline too.
>
> Maybe add a flags field to the sched_param_ex thing ;-)
>
Could be done as well... What do you mean by over-running a deadline?

By missing, I mean that at a certain instant t, typically during a
(hr)tick I notice that the scheduling deadline d is <= t, which means we
probably are in overload condition, is that the same? Just to
understand... :-)

> > At the current time, I'm just splitting the bandwidth, and nothing more.
> > Actually, I also think the solution is the right one, but I would really
> > like to discuss the issues it raises.
>
> Ooh, good point,.. yes we can put some exit hooks in there folding the
> runtime back.
>
> An alternative is starting the child out with 0 runtime, and have the
> parent run sched_setscheduler() on it giving us a clear point to run
> admission on.
>
I thought this too, we just have to chose whether the 'more natural' or,
let's say 'user friendly', behaviour is...

Thanks again for the comments.

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-23 12:23:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)


* Joe Perches <[email protected]> wrote:

> On Tue, 2009-09-22 at 17:51 -0700, Daniel Walker wrote:
> > I'll stop sending those
> > emails if it's actually negative , but I don't think it is..
>
> Hi Daniel.
>
> I think it'd be more helpful if instead of merely sending
> a "hah! checkpatch failure!" email you send the submitter
> a gentler nudge and a fixed patch.

He should consider not sending them at all. It's up to maintainers and
the developers involved with that code whether the small details that
checkpatch flags are important or not at a given point in the patch
cycle.

For example i use checkpatch all the time and i think it's a fantastic
tool, still i dont want another nuisance layer on lkml interfering with
the patch flow.

If a patch gets so far in the patch cycle that i'm thinking about
merging it, i might fix the checkpatch failures myself (often they are
trivial), and i might warn frequent contributors about repeat patterns
of small uncleanlinesses - or i might bounce the patch back to the
contributor. I also ignore certain classes of checkpatch warnings.

What Daniel is doing is basically a semi-mandatory checkpatch layer on
lkml and that's both a distraction and harmful as well. We dont need a
"checkpatch police" on lkml. We want an open, reasonable, human driven
patch process with very little buerocracy and no buerocrats.

Ingo

2009-09-23 12:33:14

by Dhaval Giani

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

>>
>> An alternative is starting the child out with 0 runtime, and have the
>> parent run sched_setscheduler() on it giving us a clear point to run
>> admission on.
>>
> I thought this too, we just have to chose whether the 'more natural' or,
> let's say 'user friendly', behaviour is...
>

I agree with Peter here. It makes more sense to start it with a 0
runtime, because anyway whenever you will fork, you will be under
controlled conditions and know how much bandwidth you need. I think we
should just let userspace handle this part.

Thanks,
Dhaval

--

Marie von Ebner-Eschenbach - "Even a stopped clock is right twice a
day." - http://www.brainyquote.com/quotes/authors/m/marie_von_ebnereschenbac.html

2009-09-23 12:26:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class


* Avi Kivity <[email protected]> wrote:

>> discouraging contributions is more something that happens when you
>> get the responses I got earlier in this thread..
>
> That's probably intentional. Whitespace fixes have their place but
> not at this stage in a patch's lifecycle.

Exactly. What might make sense is to scan linux-next for new commits
that show serious cleanliness trouble - and send fix patches to the
parties involved. That's a real effort and brings the code forward.

Ingo

2009-09-23 12:33:51

by Linus Walleij

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

Hi Raistlin,

I have trouble compiling the (mainline) kernel for ARM u300_defconfig with the
sched-edf patches:

In file included from /home/linus/src/linux-trees/linux-2.6/kernel/sched.c:1856:
/home/linus/src/linux-trees/linux-2.6/kernel/sched_edf.c: In function
'pick_next_task_edf':
/home/linus/src/linux-trees/linux-2.6/kernel/sched_edf.c:435: error:
implicit declaration of function 'hrtick_enabled'

The hrtick_enabled() function is a static in sched.c, and should be visible
in sched_edf.c since it's #include:ed into sched.c so I'm pretty confused
about this one.

Could it be that I'm using a too bleeding edge compiler? This is a
arm-none-eabi-gcc (Sourcery G++ Lite 2008q3-66) 4.3.2
i.e. a CodeSourcery custom compiler, what are you using for ARM builds
in Pisa?

/home/linus/src/linux-trees/linux-2.6/kernel/sched.c: In function
'__setscheduler_ex':
/home/linus/src/linux-trees/linux-2.6/kernel/sched.c:6318: error:
'struct sched_edf_entity' has no member named 'bw'
/home/linus/src/linux-trees/linux-2.6/kernel/sched.c:6318: error:
implicit declaration of function 'to_ratio'

This is another thing: the code in struct sched_edf_entity only compiles
in the field bw if you have CONFIG_EDF_GROUP_SCHED, but the
code in sched.c __setscheduler_ex() use it no matter whether that's
configured or not.

This patch fixes it.

diff --git a/kernel/sched.c b/kernel/sched.c
index b41fc65..9ce89d4 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6315,8 +6315,10 @@ __setscheduler_ex(struct rq *rq, struct
task_struct *p, int policy,

edf_se->runtime_max = timespec_to_ns(&param_ex->sched_runtime);
edf_se->period = timespec_to_ns(&param_ex->sched_period);
+#ifdef CONFIG_EDF_GROUP_SCHED
edf_se->bw = to_ratio(timespec_to_ns(&param_ex->sched_period),
timespec_to_ns(&param_ex->sched_runtime));
+#endif

edf_se->runtime = edf_se->runtime_max;

Yours,
Linus Walleij

2009-09-23 12:50:33

by Linus Walleij

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

2009/9/23 Linus Walleij <[email protected]>:

> In file included from /home/linus/src/linux-trees/linux-2.6/kernel/sched.c:1856:
> /home/linus/src/linux-trees/linux-2.6/kernel/sched_edf.c: In function
> 'pick_next_task_edf':
> /home/linus/src/linux-trees/linux-2.6/kernel/sched_edf.c:435: error:
> implicit declaration of function 'hrtick_enabled'

Nah, no complex explanations... it's just compiled out but still used,
here's another patch:

diff --git a/kernel/sched_edf.c b/kernel/sched_edf.c
index cdec433..198ad6b 100644
--- a/kernel/sched_edf.c
+++ b/kernel/sched_edf.c
@@ -431,9 +431,11 @@ struct task_struct *pick_next_task_edf(struct rq *rq)
BUG_ON(!edf_se);

p = edf_task_of(edf_se);
- p->se.exec_start = rq->clock;
+ p->se.exec_start = rq->clock
+#ifdef CONFIG_SCHED_HRTICK
if (hrtick_enabled(rq))
start_hrtick_edf(rq, p);
+#endif

return p;
}

Yours,
Linus Walleij

2009-09-23 13:00:23

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-22 at 22:55 +0200, Linus Walleij wrote:
> Hi Raistlin,
>
Hi,

> it's great fun to see this scheduler materialize, I was present at the
> workshop in Kaiserslautern and waited for the code since.
Sorry it took long... Well, it is taking, actually, since it is not
finished! :D
Anyway, it's a pleasure to entertain you, at least I am now sure that
all this developing has not been completely pointless! :D :D

> What I'm
> interested in getting a briefing on right now is what practical use EDF
> schedulers have, since I think this is what many people will ask for.
>
100% agree with you. I think you directly hit one of the central points.

> I think actually the answers are the same as when I asked at Meld
> (of all places) what HRtimers are actually for. And I I got the non-all
> inclusive answer that they're for Flight simulators.
> Which is obviously in the domain of automatic control.
> http://en.wikipedia.org/wiki/Automatic_control
>
Wow, flight simulator are funny... They surely worth tglx and others
hrtimers efforts! :-)

> So one application of HRTimers has been to trigger events in
> userspace programs, especially when your dealing with issues
> related to automatic control. Automatic control is also
> mentioned throughout this patch.
>
Ok, turning back being serious, you are right about automatic control.
In fact, control applications are both a source of great inspiration for
new research in real-time systems, and probably the "field" where the
outcomes of that research might be exploited with best results.

This is, in my opinion, because in control applications interaction with
the external environment is the key part of the whole thing... And
external environment has its own timing that you usually can't change.
In fact, real-time is not about speed, timers precision, wakeup
latencies, etc. it is about predictability, and you usually need that
your robotic arm that can cut off some people's head if misbehaving to
be predictable! :O

However, a kernel with a predictable scheduler but with high latencies
and poor resolution timing event delivering may be enough, if your
environment runs at a proper, let's say, small rate, which is usually
not the case... And you can't wakeup your PID controller task at 1ms
rate if your OS timers have 4ms resolution, do you? :P

Thus, I think lowering kernel latencies, reducing the kernel
interference with sensible application to the bare minimum (e.g., the
interrupt threading, and so on) _and_ introducing deterministic and
predictable scheduling policies are distinct things, but both contribute
to the enhancement of Linux real-time behaviour, and on the
opportunities of using Linux in real-time environments, like automatic
control.

> I am very interested in if you or someone else
> could elaborate on the relation between HRtimers and deadline
> scheduling. To my untrained mind it looks like HRtimers will
> fire events to your task in a very precise manner, but
> you cannot currently guarantee that they will then proceed to
> process their target task in time (meet a deadline).
>
Ok, I think I just said something about this above, however, what you
say seems correct to me.
Trying to explain my view on this better (sorry if repeating myself) I
can say that, in principle, there is *NO* relation between hrtimers (or
interrupt threading, or low latencies, etc.) and deadline scheduling.

A periodic task in a real time systems has to wake-up at certain point
in time, say t, and complete its execution by another time instant, say
t+d, when it then goes to sleep until t+T.

So, waking up the task at t and t+T, or, e.g., notifying it missed its
deadline at t+d, is responsibility of the timers, and it is much more
better if their resolution is appropriate, with respect to the timing
constants we are talking about (I would say resolution << d <= T).
After the task is awake --especially if more than one task is awake at a
given point in time! :D-- scheduling it in a way it can complete by t+d
it is mostly up to the scheduler.

Let me add that, in order, of doing this, a fixed priority scheduler
(just like the POSIX one already in place in Linux for years) may be
more than suitable, and there may be no need for deadline scheduling at
all! There are tons of research papers about properties of fixed
priority scheduler real-time systems, and all the RTOS I'm aware of use
variation on this policy, while none of them has EDF, or similar,
implemented!
Nevertheless, whether your needs are satisfied by a fixed or dynamic
(e.g., EDF) priority scheduler, you may find useful precise task event
triggering. :-)

This is the non-relation I see between the two things. It seems to me
very similar to what you wrote, but I tried to elaborate as much as I
can, as you asked... Sorry for being so long and, probably, so
boring! :-(

> I'm under the impression that some people currently use
> some periodic HRtimers (through either the timerfd_create system
> call I guess, or POSIX timer_create(CLOCK_REALTIME_HR...)
> timers combined with some SHED_FIFO or so to achieve the same
> thing a deadline scheduler would do, but since SCHED_FIFO cannot
> really guarantee that this will work, you simply underutilize
> the system a bit, add the CONFIG_PREEMPT_RT patch so the
> timers are not blocked by things like slow interrupthandlers or
> starvation (priority inversion) and then hope for the best. It turns
> out to work. You measure on the system and it has the desired
> "hard realtime" characteristics.
>
Ok, I think you are describing two (possible?) different behaviours,
which deserves different treatment.

If one applies preempt-rt, setschedulers some task to FIFO, crosses his
fingers and sees if all works, well, this is what I might consider the
best sin one can perpetrate... But the world is so beautiful because
there exists a lot of different opinions! :D :D

On the contrary, as written above, high resolution timers, low latency
kernel and fixed priority scheduler (which _is_ predictable and
deterministic, maybe even more, if possible, than deadline one!) can be
used for successfully building up an hard real-time system, working like
a swiss clock, with strong guarantees and no needs for "hoping for the
best".

Since I think you are talking about the first approach, well, there are
a lot of applications that might be designed as a real-time application,
be given a period and a deadline and then scheduled according to
(something similar to) real-time scheduling theory, either with fixed or
deadline schedulers. Just think to a video/media player, and I think you
will realize that's the perfect example!
However, they are not designed as such and they nevertheless work more
than well. Thus, I don't think we (we = real-time academic people...
strange guys :P) can say that our solution (EDF scheduling in this case)
is the only correct and the best ever... I mean, of course we think like
that, but I would never say it here! :D :D

What I hope is that, deadline scheduling, given its high efficiency
(scheduling, not implementation) and flexibility, would be appealing
enough to be considered by people interested in real-time and control
applications, but also from some other guys... And that this make it
worth to have it in the kernel at some time! :-)

> Do people do such things? I haven't seen those applications,
> they all tend to be closed-source really.
>
Yeah, that's the big deal here, actually. :-(

> I would assume the
> vxWorks/pSOS/etc compatibility wrapper libraries do things
> like this, do they not?
>
Not sure of what you are thinking about here... Is it things like
Xenomai(/solo) skins?

> Now since that example up there included the RTOS replacement
> case, that means the list of possible applications utilizing these very
> responsive control systems would be the same as any good old
> so-called RTOS and include user-space applications for:
>
> * Any interactive simulations and games
(here you touched my heart... I do love games! :P)
> * Industrial automation including robotics
> * Process industry applications (e.g. chemical processing, power
> plants, paper/car/textile/etc manufacturing)
> * Various weapons including missiles
> * Medical equipment
> * The above will sooner or later include a dataflow which
> need some processing on it which leads to thinking of
> data (token) flows leading up to:
> * Massive parallelization of FunnyStuff like the models found
> in OpenDF (also part of the ACTORS project):
> http://opendf.svn.sourceforge.net/viewvc/opendf/trunk/models/
> here (correct me if I'm wrong) the deadline scheduling
> will be used to construct timewise predictable flows in massively
> parallel threadpools distributed over several processing nodes etc.
> Illustrations include MPEG4 decoding, FIR, FFT so let's
> say real-time constrained signal processing on some
> time-critical dataflow.
> (Don't know the deadline-critical parts of the Mandelbrot
> set or the Game of Life though, hehe.)
Oh, sorry, but I'm light year far from correcting you on these issues...
Since I'm sure there are other people in the project with light years
better confidence than me with these issues! :-P

> This sounds a lot like a nail in the coffin for the RTOS and all
> its virtualized variants (running Linux inside an RTOS etc).
>
Well, I mostly agree. However, I think you mentioned in your list some
application that we can call "hard" real-time activities, some that we
can call "soft".
I hope all these efforts, maybe including EDF/Deadline scheduling, may
get Linux closer to correct and efficient handling of at least soft
(games, simulations, multimedia, etc) workloads.

If it will be able to completely replace all the existing RTOS I don't
know... Let's hope that!
Actually, it could be one of the (if not _the_) first supporting EDF,
which may represent a nice plus toward them. :-)

> (Please excuse my bastardized condensed-form popular science
> version of the noble goals of the ACTORS project, I only observed
> it at a distance so my vision was dimmed.)
>
Hey, no problem, as already said, the existence of so much different
opinions is a _good_ thing to me! :-D

Thanks a lot for the very interesting comments!

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-23 13:22:11

by Claudio Scordino

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

Hi Linus,

Linus Walleij ha scritto:
> Hi Raistlin,
>
> it's great fun to see this scheduler materialize, I was present at the
> workshop in Kaiserslautern and waited for the code since. What I'm
> interested in getting a briefing on right now is what practical use EDF
> schedulers have, since I think this is what many people will ask for.
>
> I think actually the answers are the same as when I asked at Meld
> (of all places) what HRtimers are actually for. And I I got the non-all
> inclusive answer that they're for Flight simulators.
> Which is obviously in the domain of automatic control.
> http://en.wikipedia.org/wiki/Automatic_control
>
> So one application of HRTimers has been to trigger events in
> userspace programs, especially when your dealing with issues
> related to automatic control. Automatic control is also
> mentioned throughout this patch. (Yes, I know HRtimers
> have other great uses also, especially in-kernel, those will
> remain.)
>
> I am very interested in if you or someone else
> could elaborate on the relation between HRtimers and deadline
> scheduling. To my untrained mind it looks like HRtimers will
> fire events to your task in a very precise manner, but
> you cannot currently guarantee that they will then proceed to
> process their target task in time (meet a deadline).

Yes. With HRT you have timing measures with a higher resolution than
using normal timers. This means that everything in the system runs more
precisely, and latencies are reduced.

However, "how" it runs, depends on the scheduling algorithm, which is
the part of the kernel that at any instant chooses which task must be
executed. Our scheduling class introduces the chance of ordering tasks
execution in a further manner, that is according to the Earliest
Deadline First (EDF) algorithm. Therefore, after the patch is applied, a
new scheduling policy is available for scheduling tasks. Consider that
old policies (sched_fair and sched_rt) still remain, and you are not
forced to use EDF ordering. This way, if you don't use EDF tasks, the
system still behaves as without our patch.

> I'm under the impression that some people currently use
> some periodic HRtimers (through either the timerfd_create system
> call I guess, or POSIX timer_create(CLOCK_REALTIME_HR...)
> timers combined with some SHED_FIFO or so to achieve the same
> thing a deadline scheduler would do, but since SCHED_FIFO cannot
> really guarantee that this will work, you simply underutilize
> the system a bit, add the CONFIG_PREEMPT_RT patch so the
> timers are not blocked by things like slow interrupthandlers or
> starvation (priority inversion) and then hope for the best. It turns
> out to work. You measure on the system and it has the desired
> "hard realtime" characteristics.

Let's take a step back. The problem in real-time scheduling is to find
an order of scheduling tasks so that each of them meets its deadlines.
This way, you get determinism and predictability (i.e., you can trust
that your tasks will be executed at the "right" instant of time).
Several scheduling algorithms exist to generate different orders of
execution that ensure this property.

In particular, several differences exist between fixed priority and
dynamic priority algorithms (like SCHED_EDF). In my opinion, the most
relevant is the following one.

Suppose you have a single processor and a set composed by periodic
real-time tasks, each one characterized by an amount of execution time
("budget") that must be executed every "period". This way, the deadline
for each task is equal to the end of its current period.

The sum of (budget/period) of all tasks is usually called "bandwidth".

Using fixed priority schedulers, you need a quite complicated equation
to understand if it is possible to guarantee the deadlines of every
task. This equation must be re-computed each time you change a budget or
period of some task.
Even worse, depending on the task set, it may happen that the bandwidth
must be lower than 100% (sometimes as low as 69%), otherwise you cannot
be sure that all tasks will meet their deadline (they might, but you
cannot trust).

With EDF, instead, the test is much easier: if the bandwidth is less
than 100%, then you are sure that the deadlines will be met. In other
words, you can fully exploit processor bandwidth up to 100% being sure
that timing constraints of EDF tasks will be guaranteed.

> Do people do such things? I haven't seen those applications,
> they all tend to be closed-source really. I would assume the
> vxWorks/pSOS/etc compatibility wrapper libraries do things
> like this, do they not?

Well, ACTORS is an example of an industrial application which needs an
EDF scheduler on Linux, isn't it ?

But I've also seen a couple of multimedia applications which needed an
EDF scheduler to meet deadlines of periodic tasks and, at the same time,
exploit CPU utilization up to 100%.

Regards,

Claudio

2009-09-23 13:30:33

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Wed, 2009-09-23 at 14:33 +0200, Linus Walleij wrote:
> Hi Raistlin,
>
Hi again,

> I have trouble compiling the (mainline) kernel for ARM u300_defconfig with the
> sched-edf patches:
>
> In file included from /home/linus/src/linux-trees/linux-2.6/kernel/sched.c:1856:
> /home/linus/src/linux-trees/linux-2.6/kernel/sched_edf.c: In function
> 'pick_next_task_edf':
> /home/linus/src/linux-trees/linux-2.6/kernel/sched_edf.c:435: error:
> implicit declaration of function 'hrtick_enabled'
>
> The hrtick_enabled() function is a static in sched.c, and should be visible
> in sched_edf.c since it's #include:ed into sched.c so I'm pretty confused
> about this one.
>
Well, I think this is my fault (as below!) :-(
hrtick_enabled is defined in sched.c, but inside CONFIG_SCED_HRTICK!

> Could it be that I'm using a too bleeding edge compiler? This is a
> arm-none-eabi-gcc (Sourcery G++ Lite 2008q3-66) 4.3.2
> i.e. a CodeSourcery custom compiler, what are you using for ARM builds
> in Pisa?

I'll ask Michael what he uses when compiling for the MPCORE and will be
back to you...

> /home/linus/src/linux-trees/linux-2.6/kernel/sched.c: In function
> '__setscheduler_ex':
> /home/linus/src/linux-trees/linux-2.6/kernel/sched.c:6318: error:
> 'struct sched_edf_entity' has no member named 'bw'
> /home/linus/src/linux-trees/linux-2.6/kernel/sched.c:6318: error:
> implicit declaration of function 'to_ratio'
>
> This is another thing: the code in struct sched_edf_entity only compiles
> in the field bw if you have CONFIG_EDF_GROUP_SCHED, but the
> code in sched.c __setscheduler_ex() use it no matter whether that's
> configured or not.
>
> This patch fixes it.
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index b41fc65..9ce89d4 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -6315,8 +6315,10 @@ __setscheduler_ex(struct rq *rq, struct
> task_struct *p, int policy,
>
> edf_se->runtime_max = timespec_to_ns(&param_ex->sched_runtime);
> edf_se->period = timespec_to_ns(&param_ex->sched_period);
> +#ifdef CONFIG_EDF_GROUP_SCHED
> edf_se->bw = to_ratio(timespec_to_ns(&param_ex->sched_period),
> timespec_to_ns(&param_ex->sched_runtime));
> +#endif
>
> edf_se->runtime = edf_se->runtime_max;
>
Yeah, this I noticed! My fault adding changes at the very last time and
forgetting to test all the possible configurations! :-P

Thanks.

New patch below (waiting for a v2)... if you can, tell me if it works
since I have not our ARM box right here and neither the crosschain on
this machine. :-(

Dario

diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index 89f7ead..e851625 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -391,6 +391,9 @@
#define __NR_pwritev (__NR_SYSCALL_BASE+362)
#define __NR_rt_tgsigqueueinfo (__NR_SYSCALL_BASE+363)
#define __NR_perf_event_open (__NR_SYSCALL_BASE+364)
+#define __NR_sched_setscheduler_ex (__NR_SYSCALL_BASE+365)
+#define __NR_sched_setparam_ex (__NR_SYSCALL_BASE+366)
+#define __NR_sched_getparam_ex (__NR_SYSCALL_BASE+367)

/*
* The following SWIs are ARM private.
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index fafce1b..42ad362 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -374,6 +374,9 @@
CALL(sys_pwritev)
CALL(sys_rt_tgsigqueueinfo)
CALL(sys_perf_event_open)
+/* 365 */ CALL(sys_sched_setscheduler_ex)
+ CALL(sys_sched_setparam_ex)
+ CALL(sys_sched_getparam_ex)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 74619c4..fde92c2 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -832,4 +832,7 @@ ia32_sys_call_table:
.quad compat_sys_pwritev
.quad compat_sys_rt_tgsigqueueinfo /* 335 */
.quad sys_perf_event_open
+ .quad sys_sched_setscheduler_ex
+ .quad sys_sched_setparam_ex
+ .quad sys_sched_getparam_ex
ia32_syscall_end:
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index 6fb3c20..a83163f 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -342,6 +342,9 @@
#define __NR_pwritev 334
#define __NR_rt_tgsigqueueinfo 335
#define __NR_perf_event_open 336
+#define __NR_sched_setscheduler_ex 337
+#define __NR_sched_setparam_ex 338
+#define __NR_sched_getparam_ex 339

#ifdef __KERNEL__

diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index 8d3ad0a..a11831c 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -661,6 +661,12 @@ __SYSCALL(__NR_pwritev, sys_pwritev)
__SYSCALL(__NR_rt_tgsigqueueinfo, sys_rt_tgsigqueueinfo)
#define __NR_perf_event_open 298
__SYSCALL(__NR_perf_event_open, sys_perf_event_open)
+#define __NR_sched_setscheduler_ex 299
+__SYSCALL(__NR_sched_setscheduler_ex, sys_sched_setscheduler_ex)
+#define __NR_sched_setparam_ex 230
+__SYSCALL(__NR_sched_setparam_ex, sys_sched_setparam_ex)
+#define __NR_sched_getparam_ex 231
+__SYSCALL(__NR_sched_getparam_ex, sys_sched_getparam_ex)

#ifndef __NO_STUBS
#define __ARCH_WANT_OLD_READDIR
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index 0157cd2..38f056c 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -336,3 +336,6 @@ ENTRY(sys_call_table)
.long sys_pwritev
.long sys_rt_tgsigqueueinfo /* 335 */
.long sys_perf_event_open
+ .long sys_sched_setscheduler_ex
+ .long sys_sched_setparam_ex
+ .long sys_sched_getparam_ex
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8fe351c..2554e08 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -38,6 +38,7 @@
#define SCHED_BATCH 3
/* SCHED_ISO: reserved but not implemented yet */
#define SCHED_IDLE 5
+#define SCHED_EDF 6
/* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
#define SCHED_RESET_ON_FORK 0x40000000

@@ -94,6 +95,19 @@ struct sched_param {

#include <asm/processor.h>

+/*
+ * For an typical EDF periodic or sporadic task we need a runtime
+ * and period (equal to deadline) specification in the sched_param
+ * structure.
+ * However, the original struct sched_param can't be modified, for binary
+ * compatibility issues, and thus this new one is created.
+ */
+struct sched_param_ex {
+ int sched_priority;
+ struct timespec sched_period;
+ struct timespec sched_runtime;
+};
+
struct exec_domain;
struct futex_pi_state;
struct robust_list_head;
@@ -147,12 +161,15 @@ extern unsigned long get_parent_ip(unsigned long addr);

struct seq_file;
struct cfs_rq;
+struct edf_rq;
struct task_group;
#ifdef CONFIG_SCHED_DEBUG
extern void proc_sched_show_task(struct task_struct *p, struct seq_file *m);
extern void proc_sched_set_task(struct task_struct *p);
extern void
print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq);
+extern void
+print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq);
#else
static inline void
proc_sched_show_task(struct task_struct *p, struct seq_file *m)
@@ -165,6 +182,10 @@ static inline void
print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
}
+static inline void
+print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq)
+{
+}
#endif

extern unsigned long long time_sync_thresh;
@@ -1161,6 +1182,28 @@ struct sched_entity {
#endif
};

+#define EDF_NEW 1
+#define EDF_ENDED 2
+#define EDF_RUNNING 4
+
+struct sched_edf_entity {
+ struct rb_node rb_node;
+ /* actual scheduling parameters */
+ u64 deadline;
+ s64 runtime;
+ unsigned int edf_flags;
+
+ /* original parameters taken from sched_param_ex */
+ u64 runtime_max;
+ u64 period;
+#ifdef CONFIG_EDF_GROUP_SCHED
+ u64 bw;
+#endif
+
+ int nr_cpus_allowed;
+ struct hrtimer edf_timer;
+};
+
struct sched_rt_entity {
struct list_head run_list;
unsigned long timeout;
@@ -1198,6 +1241,7 @@ struct task_struct {
unsigned int rt_priority;
const struct sched_class *sched_class;
struct sched_entity se;
+ struct sched_edf_entity edf;
struct sched_rt_entity rt;

#ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -1543,6 +1587,18 @@ static inline int rt_task(struct task_struct *p)
return rt_prio(p->prio);
}

+static inline int edf_policy(int policy)
+{
+ if (unlikely(policy == SCHED_EDF))
+ return 1;
+ return 0;
+}
+
+static inline int edf_task(struct task_struct *p)
+{
+ return edf_policy(p->policy);
+}
+
static inline struct pid *task_pid(struct task_struct *task)
{
return task->pids[PIDTYPE_PID].pid;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 8d8285a..f5b76a1 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -33,6 +33,7 @@ struct pollfd;
struct rlimit;
struct rusage;
struct sched_param;
+struct sched_param_ex;
struct semaphore;
struct sembuf;
struct shmid_ds;
@@ -390,11 +391,17 @@ asmlinkage long sys_clock_nanosleep(clockid_t which_clock, int flags,
asmlinkage long sys_nice(int increment);
asmlinkage long sys_sched_setscheduler(pid_t pid, int policy,
struct sched_param __user *param);
+asmlinkage long sys_sched_setscheduler_ex(pid_t pid, int policy,
+ struct sched_param_ex __user *param);
asmlinkage long sys_sched_setparam(pid_t pid,
struct sched_param __user *param);
+asmlinkage long sys_sched_setparam_ex(pid_t pid,
+ struct sched_param_ex __user *param);
asmlinkage long sys_sched_getscheduler(pid_t pid);
asmlinkage long sys_sched_getparam(pid_t pid,
struct sched_param __user *param);
+asmlinkage long sys_sched_getparam_ex(pid_t pid,
+ struct sched_param_ex __user *param);
asmlinkage long sys_sched_setaffinity(pid_t pid, unsigned int len,
unsigned long __user *user_mask_ptr);
asmlinkage long sys_sched_getaffinity(pid_t pid, unsigned int len,
diff --git a/init/Kconfig b/init/Kconfig
index 0aa6579..cd119ac 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -454,6 +454,20 @@ config RT_GROUP_SCHED
realtime bandwidth for them.
See Documentation/scheduler/sched-rt-group.txt for more information.

+config EDF_GROUP_SCHED
+ bool "Group scheduling for SCHED_EDF"
+ depends on EXPERIMENTAL
+ depends on GROUP_SCHED
+ depends on CGROUPS
+ depends on !USER_SCHED
+ select CPUSETS
+ default n
+ help
+ This feature lets you explicitly specify, in terms of runtime
+ and period, the bandwidth of a task control group. This means
+ tasks and other control groups can be added, until such bandwidth
+ limit is reached, after that they will fail.
+
choice
depends on GROUP_SCHED
prompt "Basis for grouping tasks"
diff --git a/kernel/fork.c b/kernel/fork.c
index 2cebfb2..b4f3874 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1203,6 +1203,18 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->parent_exec_id = current->self_exec_id;
}

+ /*
+ * In case an EDF task is forking, both parent and child get half
+ * the bandwidth of the parent (via dividing runtime by 2).
+ * This make it a lot easier to deal with control group bandwidth, and
+ * also prevent an heavily spawning task to exhaust the system (or
+ * a control group, if enabled) bandwidht.
+ */
+ if (edf_task(p->real_parent)) {
+ p->real_parent->edf.runtime_max /= 2;
+ p->edf.runtime_max = p->real_parent->edf.runtime_max;
+ }
+
spin_lock(&current->sighand->siglock);

/*
diff --git a/kernel/sched.c b/kernel/sched.c
index 91843ba..4681a5c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -131,6 +131,11 @@ static inline int task_has_rt_policy(struct task_struct *p)
return rt_policy(p->policy);
}

+static inline int task_has_edf_policy(struct task_struct *p)
+{
+ return edf_policy(p->policy);
+}
+
/*
* This is the priority-queue data structure of the RT scheduling class:
*/
@@ -227,6 +232,16 @@ static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b)
}
#endif

+struct edf_bandwidth {
+ spinlock_t lock;
+ /* runtime and period that determine the bandwidth of the group */
+ u64 runtime_max;
+ u64 period;
+ u64 bw;
+ /* accumulator for the actual allocated bandwidth in a group */
+ u64 total_bw;
+};
+
/*
* sched_domains_mutex serializes calls to arch_init_sched_domains,
* detach_destroy_domains and partition_sched_domains.
@@ -259,6 +274,11 @@ struct task_group {
unsigned long shares;
#endif

+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct edf_bandwidth edf_bandwidth;
+ struct edf_rq **edf_rq;
+#endif
+
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
struct rt_rq **rt_rq;
@@ -296,6 +316,10 @@ static DEFINE_PER_CPU(struct sched_entity, init_sched_entity);
static DEFINE_PER_CPU_SHARED_ALIGNED(struct cfs_rq, init_tg_cfs_rq);
#endif /* CONFIG_FAIR_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+static DEFINE_PER_CPU(struct edf_rq, init_edf_rq) ____cacheline_aligned_in_smp;
+#endif
+
#ifdef CONFIG_RT_GROUP_SCHED
static DEFINE_PER_CPU(struct sched_rt_entity, init_sched_rt_entity);
static DEFINE_PER_CPU_SHARED_ALIGNED(struct rt_rq, init_rt_rq);
@@ -447,6 +471,18 @@ struct cfs_rq {
#endif
};

+struct edf_rq {
+ unsigned long edf_nr_running;
+
+ /* as in CFS, the runqueue is based on an rbtree, deadline ordered */
+ struct rb_root rb_root;
+ struct rb_node *rb_leftmost;
+
+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct rq *rq;
+#endif
+};
+
/* Real-Time classes' related field in a runqueue: */
struct rt_rq {
struct rt_prio_array active;
@@ -544,6 +580,7 @@ struct rq {
u64 nr_migrations_in;

struct cfs_rq cfs;
+ struct edf_rq edf;
struct rt_rq rt;

#ifdef CONFIG_FAIR_GROUP_SCHED
@@ -1816,7 +1853,9 @@ static void calc_load_account_active(struct rq *this_rq);
#include "sched_stats.h"
#include "sched_idletask.c"
#include "sched_fair.c"
+#include "sched_edf.c"
#include "sched_rt.c"
+
#ifdef CONFIG_SCHED_DEBUG
# include "sched_debug.c"
#endif
@@ -1837,7 +1876,7 @@ static void dec_nr_running(struct rq *rq)

static void set_load_weight(struct task_struct *p)
{
- if (task_has_rt_policy(p)) {
+ if (task_has_rt_policy(p) || task_has_edf_policy(p)) {
p->se.load.weight = prio_to_weight[0] * 2;
p->se.load.inv_weight = prio_to_wmult[0] >> 1;
return;
@@ -2523,7 +2562,8 @@ void sched_fork(struct task_struct *p, int clone_flags)
* Revert to default priority/policy on fork if requested.
*/
if (unlikely(p->sched_reset_on_fork)) {
- if (p->policy == SCHED_FIFO || p->policy == SCHED_RR)
+ if (p->policy == SCHED_FIFO || p->policy == SCHED_RR ||
+ edf_policy(p->policy))
p->policy = SCHED_NORMAL;

if (p->normal_prio < DEFAULT_PRIO)
@@ -2541,8 +2581,12 @@ void sched_fork(struct task_struct *p, int clone_flags)
p->sched_reset_on_fork = 0;
}

- if (!rt_prio(p->prio))
- p->sched_class = &fair_sched_class;
+ if (!rt_prio(p->prio)) {
+ if (edf_policy(p->policy))
+ p->sched_class = &edf_sched_class;
+ else
+ p->sched_class = &fair_sched_class;
+ }

#ifdef CONFIG_SMP
cpu = p->sched_class->select_task_rq(p, SD_BALANCE_FORK, 0);
@@ -2565,6 +2609,68 @@ void sched_fork(struct task_struct *p, int clone_flags)
put_cpu();
}

+#if defined(CONFIG_RT_GROUP_SCHED) || defined(CONFIG_EDF_GROUP_SCHED)
+static unsigned long to_ratio(u64 period, u64 runtime)
+{
+ if (runtime == RUNTIME_INF)
+ return 1ULL << 20;
+
+ return div64_u64(runtime << 20, period);
+}
+
+#ifdef CONFIG_EDF_GROUP_SCHED
+static inline
+void __edf_tg_clear_task_bw(struct task_group *tg, u64 tsk_bw)
+{
+ tg->edf_bandwidth.total_bw -= tsk_bw;
+}
+
+static inline
+void __edf_tg_add_task_bw(struct task_group *tg, u64 tsk_bw)
+{
+ tg->edf_bandwidth.total_bw += tsk_bw;
+}
+
+static inline
+int __edf_check_task_tg_bw(struct task_struct *p, struct task_group *tg,
+ int policy, struct sched_param_ex *param_ex)
+{
+ u64 tg_bw = tg->edf_bandwidth.bw;
+ u64 bw = p->edf.bw;
+ u64 new_bw = 0;
+
+ if (tg->edf_bandwidth.bw <= 0)
+ return 0;
+
+ if (edf_policy(policy))
+ new_bw = to_ratio(timespec_to_ns(&param_ex->sched_period),
+ timespec_to_ns(&param_ex->sched_runtime));
+
+ /*
+ * Either if a task, enters, leave, or stays SCHED_EDF but changing
+ * its parameters, we need to update accordingly the allocated
+ * bandwidth of the control group it is inside.
+ */
+ if (task_has_edf_policy(p) && !edf_policy(policy)) {
+ __edf_tg_clear_task_bw(tg, bw);
+ return 1;
+ } else if (task_has_edf_policy(p) && edf_policy(policy) &&
+ tg_bw >= tg->edf_bandwidth.total_bw - bw + new_bw) {
+ __edf_tg_clear_task_bw(tg, bw);
+ __edf_tg_add_task_bw(tg, new_bw);
+ return 1;
+ } else if (edf_policy(policy) && !task_has_edf_policy(p) &&
+ tg_bw >= tg->edf_bandwidth.total_bw + new_bw) {
+ __edf_tg_add_task_bw(tg, new_bw);
+ return 1;
+ }
+
+ return 0;
+}
+#endif /* CONFIG_EDF_GROUP_SCHED */
+
+#endif /* CONFIG_RT_GROUP_SCHED || CONFIG_EDF_GROUP_SCHED */
+
/*
* wake_up_new_task - wake up a newly created task for the first time.
*
@@ -2582,6 +2688,18 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
update_rq_clock(rq);

p->prio = effective_prio(p);
+ if (edf_task(p)) {
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ /*
+ * If the new task we are waking up is EDF,
+ * some initialization is needed
+ */
+ RB_CLEAR_NODE(&edf_se->rb_node);
+ edf_se->runtime = edf_se->runtime_max;
+ edf_se->edf_flags = EDF_NEW;
+ init_edf_timer(&edf_se->edf_timer);
+ }

if (!p->sched_class->task_new || !current->se.on_rq) {
activate_task(rq, p, 0);
@@ -2725,6 +2843,23 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
+#ifdef CONFIG_EDF_GROUP_SCHED
+ if (edf_task(prev)) {
+ struct task_group *tg = task_group(prev);
+ unsigned long flags;
+
+ /*
+ * When an EDF task dies, we need to drain its
+ * bandwidth from the cgroup it was accomodating hime,
+ * to make space for new tasks.
+ */
+ spin_lock_irqsave(&tg->edf_bandwidth.lock, flags);
+ __edf_tg_clear_task_bw(tg, prev->edf.bw);
+ spin_unlock_irqrestore(&tg->edf_bandwidth.lock, flags);
+
+ hrtimer_cancel(&prev->edf.edf_timer);
+ }
+#endif
/*
* Remove function-return probe instances associated with this
* task and put them back on the free list.
@@ -5952,8 +6087,12 @@ void rt_mutex_setprio(struct task_struct *p, int prio)

if (rt_prio(prio))
p->sched_class = &rt_sched_class;
- else
- p->sched_class = &fair_sched_class;
+ else {
+ if (!edf_task(p))
+ p->sched_class = &fair_sched_class;
+ else
+ p->sched_class = &edf_sched_class;
+ }

p->prio = prio;

@@ -5987,9 +6126,9 @@ void set_user_nice(struct task_struct *p, long nice)
* The RT priorities are set via sched_setscheduler(), but we still
* allow the 'normal' nice value to be set - but as expected
* it wont have any effect on scheduling until the task is
- * SCHED_FIFO/SCHED_RR:
+ * SCHED_EDF, SCHED_FIFO or SCHED_RR:
*/
- if (task_has_rt_policy(p)) {
+ if (unlikely(task_has_rt_policy(p) || task_has_edf_policy(p))) {
p->static_prio = NICE_TO_PRIO(nice);
goto out_unlock;
}
@@ -6136,6 +6275,9 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
case SCHED_IDLE:
p->sched_class = &fair_sched_class;
break;
+ case SCHED_EDF:
+ p->sched_class = &edf_sched_class;
+ break;
case SCHED_FIFO:
case SCHED_RR:
p->sched_class = &rt_sched_class;
@@ -6149,6 +6291,25 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
set_load_weight(p);
}

+static void
+__setscheduler_ex(struct rq *rq, struct task_struct *p, int policy,
+ struct sched_param_ex *param_ex)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ RB_CLEAR_NODE(&edf_se->rb_node);
+ edf_se->edf_flags = EDF_NEW;
+
+ edf_se->runtime_max = timespec_to_ns(&param_ex->sched_runtime);
+ edf_se->period = timespec_to_ns(&param_ex->sched_period);
+ edf_se->bw = to_ratio(timespec_to_ns(&param_ex->sched_period),
+ timespec_to_ns(&param_ex->sched_runtime));
+
+ edf_se->runtime = edf_se->runtime_max;
+
+ init_edf_timer(&edf_se->edf_timer);
+}
+
/*
* check the target process has a UID that matches the current process's
*/
@@ -6166,7 +6327,9 @@ static bool check_same_owner(struct task_struct *p)
}

static int __sched_setscheduler(struct task_struct *p, int policy,
- struct sched_param *param, bool user)
+ struct sched_param *param,
+ struct sched_param_ex *param_ex,
+ bool user)
{
int retval, oldprio, oldpolicy = -1, on_rq, running;
unsigned long flags;
@@ -6186,6 +6349,7 @@ recheck:
policy &= ~SCHED_RESET_ON_FORK;

if (policy != SCHED_FIFO && policy != SCHED_RR &&
+ policy != SCHED_EDF &&
policy != SCHED_NORMAL && policy != SCHED_BATCH &&
policy != SCHED_IDLE)
return -EINVAL;
@@ -6200,6 +6364,13 @@ recheck:
(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
return -EINVAL;
+ if (edf_policy(policy) && param_ex->sched_priority != 0)
+ return -EINVAL;
+ if (edf_policy(policy) && (param_ex == NULL ||
+ timespec_to_ns(&param_ex->sched_period) == 0 ||
+ timespec_to_ns(&param_ex->sched_period) <
+ timespec_to_ns(&param_ex->sched_runtime)))
+ return -EINVAL;
if (rt_policy(policy) != (param->sched_priority != 0))
return -EINVAL;

@@ -6273,6 +6444,24 @@ recheck:
spin_unlock_irqrestore(&p->pi_lock, flags);
goto recheck;
}
+#ifdef CONFIG_EDF_GROUP_SCHED
+ /*
+ * If EDF tasks are involved, check tha bandwidth of its group,
+ * and do not allow setting the task EDF if there is not enough.
+ */
+ if (edf_policy(policy) || edf_task(p)) {
+ struct task_group *tg = task_group(p);
+
+ spin_lock(&tg->edf_bandwidth.lock);
+ if (!__edf_check_task_tg_bw(p, tg, policy, param_ex)) {
+ spin_unlock(&tg->edf_bandwidth.lock);
+ __task_rq_unlock(rq);
+ spin_unlock_irqrestore(&p->pi_lock, flags);
+ return -EPERM;
+ }
+ spin_unlock(&tg->edf_bandwidth.lock);
+ }
+#endif
update_rq_clock(rq);
on_rq = p->se.on_rq;
running = task_current(rq, p);
@@ -6285,6 +6474,8 @@ recheck:

oldprio = p->prio;
__setscheduler(rq, p, policy, param->sched_priority);
+ if (edf_policy(policy))
+ __setscheduler_ex(rq, p, policy, param_ex);

if (running)
p->sched_class->set_curr_task(rq);
@@ -6312,10 +6503,17 @@ recheck:
int sched_setscheduler(struct task_struct *p, int policy,
struct sched_param *param)
{
- return __sched_setscheduler(p, policy, param, true);
+ return __sched_setscheduler(p, policy, param, NULL, true);
}
EXPORT_SYMBOL_GPL(sched_setscheduler);

+int sched_setscheduler_ex(struct task_struct *p, int policy,
+ struct sched_param *param,
+ struct sched_param_ex *param_ex)
+{
+ return __sched_setscheduler(p, policy, param, param_ex, true);
+}
+
/**
* sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
* @p: the task in question.
@@ -6330,7 +6528,7 @@ EXPORT_SYMBOL_GPL(sched_setscheduler);
int sched_setscheduler_nocheck(struct task_struct *p, int policy,
struct sched_param *param)
{
- return __sched_setscheduler(p, policy, param, false);
+ return __sched_setscheduler(p, policy, param, NULL, false);
}

static int
@@ -6355,6 +6553,33 @@ do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
return retval;
}

+static int
+do_sched_setscheduler_ex(pid_t pid, int policy,
+ struct sched_param_ex __user *param_ex)
+{
+ struct sched_param lparam;
+ struct sched_param_ex lparam_ex;
+ struct task_struct *p;
+ int retval;
+
+ if (!param_ex || pid < 0)
+ return -EINVAL;
+ if (copy_from_user(&lparam_ex, param_ex,
+ sizeof(struct sched_param_ex)))
+ return -EFAULT;
+
+ rcu_read_lock();
+ retval = -ESRCH;
+ p = find_process_by_pid(pid);
+ if (p != NULL) {
+ lparam.sched_priority = lparam_ex.sched_priority;
+ retval = sched_setscheduler_ex(p, policy, &lparam, &lparam_ex);
+ }
+ rcu_read_unlock();
+
+ return retval;
+}
+
/**
* sys_sched_setscheduler - set/change the scheduler policy and RT priority
* @pid: the pid in question.
@@ -6372,6 +6597,22 @@ SYSCALL_DEFINE3(sched_setscheduler, pid_t, pid, int, policy,
}

/**
+ * sys_sched_setscheduler_ex - set/change the scheduler policy and EDF deadline
+ * @pid: the pid in question.
+ * @policy: new policy (should be SCHED_EDF).
+ * @param: structure containg the extended EDF parameters.
+ */
+SYSCALL_DEFINE3(sched_setscheduler_ex, pid_t, pid, int, policy,
+ struct sched_param_ex __user *, param_ex)
+{
+ if (policy < 0)
+ return -EINVAL;
+
+ return do_sched_setscheduler_ex(pid, policy, param_ex);
+}
+
+
+/**
* sys_sched_setparam - set/change the RT priority of a thread
* @pid: the pid in question.
* @param: structure containing the new RT priority.
@@ -6381,6 +6622,12 @@ SYSCALL_DEFINE2(sched_setparam, pid_t, pid, struct sched_param __user *, param)
return do_sched_setscheduler(pid, -1, param);
}

+SYSCALL_DEFINE2(sched_setparam_ex, pid_t, pid,
+ struct sched_param_ex __user *, param_ex)
+{
+ return do_sched_setscheduler_ex(pid, -1, param_ex);
+}
+
/**
* sys_sched_getscheduler - get the policy (scheduling class) of a thread
* @pid: the pid in question.
@@ -6445,6 +6692,11 @@ out_unlock:
return retval;
}

+SYSCALL_DEFINE2(sched_getparam_ex, pid_t, pid, struct sched_param_ex __user *, param)
+{
+ return -ENOSYS;
+}
+
long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
{
cpumask_var_t cpus_allowed, new_mask;
@@ -9210,6 +9462,14 @@ static void init_cfs_rq(struct cfs_rq *cfs_rq, struct rq *rq)
cfs_rq->min_vruntime = (u64)(-(1LL << 20));
}

+static void init_edf_rq(struct edf_rq *edf_rq, struct rq *rq)
+{
+ edf_rq->rb_root = RB_ROOT;
+#ifdef CONFIG_EDF_GROUP_SCHED
+ edf_rq->rq = rq;
+#endif
+}
+
static void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
{
struct rt_prio_array *array;
@@ -9275,6 +9535,22 @@ static void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
}
#endif

+#ifdef CONFIG_EDF_GROUP_SCHED
+void init_tg_edf_entry(struct task_group *tg, struct edf_rq *edf_rq,
+ struct sched_edf_entity *edf_se, int cpu, int add,
+ struct sched_edf_entity *parent)
+{
+ struct rq *rq = cpu_rq(cpu);
+
+ tg->edf_rq[cpu] = &rq->edf;
+
+ spin_lock_init(&tg->edf_bandwidth.lock);
+ tg->edf_bandwidth.runtime_max = 0;
+ tg->edf_bandwidth.period = 0;
+ tg->edf_bandwidth.total_bw = 0;
+}
+#endif
+
#ifdef CONFIG_RT_GROUP_SCHED
static void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq,
struct sched_rt_entity *rt_se, int cpu, int add,
@@ -9316,6 +9592,10 @@ void __init sched_init(void)
#ifdef CONFIG_RT_GROUP_SCHED
alloc_size += 2 * nr_cpu_ids * sizeof(void **);
#endif
+#ifdef CONFIG_EDF_GROUP_SCHED
+ alloc_size += 2 * nr_cpu_ids * sizeof(void **);
+#endif
+
#ifdef CONFIG_USER_SCHED
alloc_size *= 2;
#endif
@@ -9344,6 +9624,10 @@ void __init sched_init(void)
ptr += nr_cpu_ids * sizeof(void **);
#endif /* CONFIG_USER_SCHED */
#endif /* CONFIG_FAIR_GROUP_SCHED */
+#ifdef CONFIG_EDF_GROUP_SCHED
+ init_task_group.edf_rq = (struct edf_rq **)ptr;
+ ptr += nr_cpu_ids * sizeof(void **);
+#endif /* CONFIG_EDF_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
init_task_group.rt_se = (struct sched_rt_entity **)ptr;
ptr += nr_cpu_ids * sizeof(void **);
@@ -9403,6 +9687,7 @@ void __init sched_init(void)
rq->calc_load_active = 0;
rq->calc_load_update = jiffies + LOAD_FREQ;
init_cfs_rq(&rq->cfs, rq);
+ init_edf_rq(&rq->edf, rq);
init_rt_rq(&rq->rt, rq);
#ifdef CONFIG_FAIR_GROUP_SCHED
init_task_group.shares = init_task_group_load;
@@ -9450,6 +9735,13 @@ void __init sched_init(void)
#endif
#endif /* CONFIG_FAIR_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+#ifdef CONFIG_CGROUP_SCHED
+ init_tg_edf_entry(&init_task_group, &rq->edf,
+ NULL, i, 1, NULL);
+#endif
+#endif
+
rq->rt.rt_runtime = def_rt_bandwidth.rt_runtime;
#ifdef CONFIG_RT_GROUP_SCHED
INIT_LIST_HEAD(&rq->leaf_rt_rq_list);
@@ -9760,6 +10052,68 @@ static inline void unregister_fair_sched_group(struct task_group *tg, int cpu)
}
#endif /* CONFIG_FAIR_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+static void free_edf_sched_group(struct task_group *tg)
+{
+ kfree(tg->edf_rq);
+}
+
+int alloc_edf_sched_group(struct task_group *tg, struct task_group *parent)
+{
+ struct rq *rq;
+ int i;
+
+ tg->edf_rq = kzalloc(sizeof(struct edf_rq*) * nr_cpu_ids, GFP_KERNEL);
+ if (!tg->edf_rq)
+ return 0;
+
+ for_each_possible_cpu(i) {
+ rq = cpu_rq(i);
+ init_tg_edf_entry(tg, &rq->edf, NULL, i, 0, NULL);
+ }
+
+ return 1;
+}
+
+int sched_edf_can_attach(struct cgroup *cgrp, struct task_struct *tsk)
+{
+ struct task_group *tg = container_of(cgroup_subsys_state(cgrp,
+ cpu_cgroup_subsys_id),
+ struct task_group, css);
+ u64 tg_bw = tg->edf_bandwidth.bw;
+ u64 tsk_bw = tsk->edf.bw;
+
+ if (!edf_task(tsk))
+ return 1;
+
+ /*
+ * Check if the requested group has enough free space
+ * for this particular task.
+ */
+ if (tg_bw < tsk_bw + tg->edf_bandwidth.total_bw)
+ return 0;
+
+ return 1;
+}
+#else
+static inline void free_edf_sched_group(struct task_group *tg)
+{
+}
+
+static inline
+int alloc_edf_sched_group(struct task_group *tg, struct task_group *parent)
+{
+ return 1;
+}
+#endif
+static inline void register_edf_sched_group(struct task_group *tg, int cpu)
+{
+}
+
+static inline void unregister_edf_sched_group(struct task_group *tg, int cpu)
+{
+}
+
#ifdef CONFIG_RT_GROUP_SCHED
static void free_rt_sched_group(struct task_group *tg)
{
@@ -9852,6 +10206,7 @@ static inline void unregister_rt_sched_group(struct task_group *tg, int cpu)
static void free_sched_group(struct task_group *tg)
{
free_fair_sched_group(tg);
+ free_edf_sched_group(tg);
free_rt_sched_group(tg);
kfree(tg);
}
@@ -9870,12 +10225,16 @@ struct task_group *sched_create_group(struct task_group *parent)
if (!alloc_fair_sched_group(tg, parent))
goto err;

+ if (!alloc_edf_sched_group(tg, parent))
+ goto err;
+
if (!alloc_rt_sched_group(tg, parent))
goto err;

spin_lock_irqsave(&task_group_lock, flags);
for_each_possible_cpu(i) {
register_fair_sched_group(tg, i);
+ register_edf_sched_group(tg, i);
register_rt_sched_group(tg, i);
}
list_add_rcu(&tg->list, &task_groups);
@@ -9906,12 +10265,31 @@ void sched_destroy_group(struct task_group *tg)
{
unsigned long flags;
int i;
+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct task_group *parent = tg->parent;
+ u64 tg_bw = tg->edf_bandwidth.bw;

spin_lock_irqsave(&task_group_lock, flags);
+
+ /*
+ * If an EDF group goes away, its parent group
+ * (if any), ends up with some free bandwidth that
+ * it might use for other groups/tasks.
+ */
+ spin_lock(&parent->edf_bandwidth.lock);
+ if (tg_bw && parent)
+ parent->edf_bandwidth.total_bw -= tg_bw;
+ spin_unlock(&parent->edf_bandwidth.lock);
+#else
+ spin_lock_irqsave(&task_group_lock, flags);
+#endif
+
for_each_possible_cpu(i) {
unregister_fair_sched_group(tg, i);
+ unregister_edf_sched_group(tg, i);
unregister_rt_sched_group(tg, i);
}
+
list_del_rcu(&tg->list);
list_del_rcu(&tg->siblings);
spin_unlock_irqrestore(&task_group_lock, flags);
@@ -10057,14 +10435,6 @@ unsigned long sched_group_shares(struct task_group *tg)
*/
static DEFINE_MUTEX(rt_constraints_mutex);

-static unsigned long to_ratio(u64 period, u64 runtime)
-{
- if (runtime == RUNTIME_INF)
- return 1ULL << 20;
-
- return div64_u64(runtime << 20, period);
-}
-
/* Must be called with tasklist_lock held */
static inline int tg_has_rt_tasks(struct task_group *tg)
{
@@ -10368,9 +10738,15 @@ static int
cpu_cgroup_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct task_struct *tsk)
{
+#if defined(CONFIG_RT_GROUP_SCHED) || defined(CONFIG_EDF_GROUP_SCHED)
#ifdef CONFIG_RT_GROUP_SCHED
if (!sched_rt_can_attach(cgroup_tg(cgrp), tsk))
return -EINVAL;
+#endif
+#ifdef CONFIG_EDF_GROUP_SCHED
+ if (!sched_edf_can_attach(cgrp, tsk))
+ return -EINVAL;
+#endif
#else
/* We don't support RT-tasks being in separate groups */
if (tsk->sched_class != &fair_sched_class)
@@ -10384,6 +10760,30 @@ static void
cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
struct cgroup *old_cont, struct task_struct *tsk)
{
+#ifdef CONFIG_EDF_GROUP_SCHED
+ struct task_group *tg = container_of(cgroup_subsys_state(cgrp,
+ cpu_cgroup_subsys_id),
+ struct task_group, css);
+ struct task_group *old_tg = container_of(cgroup_subsys_state(old_cont,
+ cpu_cgroup_subsys_id),
+ struct task_group, css);
+ u64 tsk_bw = tsk->edf.bw;
+
+ /*
+ * An amount of bandwidth equal to the bandwidth of tsk
+ * is freed in the former group of tsk, and declared occupied
+ * in the new one.
+ */
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ tg->edf_bandwidth.total_bw += tsk_bw;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+
+ if (old_tg) {
+ spin_lock_irq(&old_tg->edf_bandwidth.lock);
+ old_tg->edf_bandwidth.total_bw -= tsk_bw;
+ spin_unlock_irq(&old_tg->edf_bandwidth.lock);
+ }
+#endif
sched_move_task(tsk);
}

@@ -10426,6 +10826,100 @@ static u64 cpu_rt_period_read_uint(struct cgroup *cgrp, struct cftype *cft)
}
#endif /* CONFIG_RT_GROUP_SCHED */

+#ifdef CONFIG_EDF_GROUP_SCHED
+/*
+ * We allow runtime_max > period, since it makes sense to
+ * assign more than 100% bandwidth to a group, if on an SMP
+ * machine.
+ */
+static int __edf_schedulable(struct task_group *tg,
+ u64 runtime_max, u64 period)
+{
+ struct task_group *parent = tg->parent;
+ u64 old_bw = tg->edf_bandwidth.bw;
+ u64 parent_bw = parent ? parent->edf_bandwidth.bw : 0;
+ u64 bw = period > 0 ? to_ratio(period, runtime_max) : 0;
+ int ret = 0;
+
+ if (bw < tg->edf_bandwidth.total_bw)
+ return -EINVAL;
+
+ if (parent) {
+ /*
+ * All we need to do is check if the parent group is able
+ * to accomodate the new bandwidth of this group or not,
+ * given its actual allocated bandwidth, e.g., to other
+ * groups or tasks.
+ */
+ if (parent_bw < parent->edf_bandwidth.total_bw -
+ old_bw + bw)
+ return -EINVAL;
+
+ spin_lock_irq(&parent->edf_bandwidth.lock);
+ parent->edf_bandwidth.total_bw -= old_bw;
+ parent->edf_bandwidth.total_bw += bw;
+ spin_unlock_irq(&parent->edf_bandwidth.lock);
+ }
+
+ if (!ret) {
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ tg->edf_bandwidth.runtime_max = runtime_max;
+ tg->edf_bandwidth.period = period;
+ tg->edf_bandwidth.bw = bw;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+ }
+
+ return ret;
+}
+
+static int cpu_edf_runtime_write_uint(struct cgroup *cgrp,
+ struct cftype *cftype,
+ u64 edf_runtime_us)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+
+ return __edf_schedulable(tg,
+ edf_runtime_us * NSEC_PER_USEC,
+ tg->edf_bandwidth.period);
+}
+
+static u64 cpu_edf_runtime_read_uint(struct cgroup *cgrp, struct cftype *cft)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+ u64 runtime;
+
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ runtime = tg->edf_bandwidth.runtime_max;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+ do_div(runtime, NSEC_PER_USEC);
+
+ return runtime;
+}
+
+static int cpu_edf_period_write_uint(struct cgroup *cgrp, struct cftype *cftype,
+ u64 edf_period_us)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+
+ return __edf_schedulable(tg,
+ tg->edf_bandwidth.runtime_max,
+ edf_period_us * NSEC_PER_USEC);
+}
+
+static u64 cpu_edf_period_read_uint(struct cgroup *cgrp, struct cftype *cft)
+{
+ struct task_group *tg = cgroup_tg(cgrp);
+ u64 period;
+
+ spin_lock_irq(&tg->edf_bandwidth.lock);
+ period = tg->edf_bandwidth.period;
+ spin_unlock_irq(&tg->edf_bandwidth.lock);
+ do_div(period, NSEC_PER_USEC);
+
+ return period;
+}
+#endif
+
static struct cftype cpu_files[] = {
#ifdef CONFIG_FAIR_GROUP_SCHED
{
@@ -10446,6 +10940,19 @@ static struct cftype cpu_files[] = {
.write_u64 = cpu_rt_period_write_uint,
},
#endif
+#ifdef CONFIG_EDF_GROUP_SCHED
+ {
+ .name = "edf_runtime_us",
+ .read_u64 = cpu_edf_runtime_read_uint,
+ .write_u64 = cpu_edf_runtime_write_uint,
+ },
+ {
+ .name = "edf_period_us",
+ .read_u64 = cpu_edf_period_read_uint,
+ .write_u64 = cpu_edf_period_write_uint,
+ },
+#endif
+
};

static int cpu_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont)
diff --git a/kernel/sched_debug.c b/kernel/sched_debug.c
index efb8440..1252a43 100644
--- a/kernel/sched_debug.c
+++ b/kernel/sched_debug.c
@@ -146,7 +146,8 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
}

#if defined(CONFIG_CGROUP_SCHED) && \
- (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED))
+ (defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED) || \
+ defined(CONFIG_EDF_GROUP_SCHED))
static void task_group_path(struct task_group *tg, char *buf, int buflen)
{
/* may be NULL if the underlying cgroup isn't fully-created yet */
@@ -218,6 +219,38 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
#endif
}

+void print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq)
+{
+ s64 min_deadline = -1, max_deadline = -1;
+ struct rq *rq = &per_cpu(runqueues, cpu);
+ struct sched_edf_entity *last;
+ unsigned long flags;
+
+ SEQ_printf(m, "\nedf_rq[%d]:\n", cpu);
+
+ spin_lock_irqsave(&rq->lock, flags);
+ if (edf_rq->rb_leftmost)
+ min_deadline = (rb_entry(edf_rq->rb_leftmost,
+ struct sched_edf_entity,
+ rb_node))->deadline;
+ last = __pick_edf_last_entity(edf_rq);
+ if (last)
+ max_deadline = last->deadline;
+ spin_unlock_irqrestore(&rq->lock, flags);
+
+#define P(x) \
+ SEQ_printf(m, " .%-30s: %Ld\n", #x, (long long)(x))
+#define PN(x) \
+ SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(x))
+
+ P(edf_rq->edf_nr_running);
+ PN(min_deadline);
+ PN(max_deadline);
+
+#undef PN
+#undef P
+}
+
void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq)
{
#if defined(CONFIG_CGROUP_SCHED) && defined(CONFIG_RT_GROUP_SCHED)
@@ -300,6 +333,7 @@ static void print_cpu(struct seq_file *m, int cpu)
#undef P
#endif
print_cfs_stats(m, cpu);
+ print_edf_stats(m, cpu);
print_rt_stats(m, cpu);

print_rq(m, rq, cpu);
diff --git a/kernel/sched_edf.c b/kernel/sched_edf.c
new file mode 100644
index 0000000..cdec433
--- /dev/null
+++ b/kernel/sched_edf.c
@@ -0,0 +1,550 @@
+/*
+ * Earliest Deadline Firsf (EDF) Scheduling Class (SCHED_EDF policy)
+ *
+ * Earliest Deadline First scheduling for periodic or sporadic tasks.
+ *
+ * An EDF task specifies:
+ * - a maximum runtime,
+ * - a period or minimum interarrival time.
+ * The scheduler, on its turn:
+ * - schedule the task according to its deadline d=t+period,
+ * - enforces the task to not overcame its bandwidth B=runtime_max/period.
+ *
+ * At the end of each instance of a periodic job (or sporadic activation),
+ * the task may inform the scheduler about such event, by calling
+ * sched_yield(), and then suspending until next activation time.
+ *
+ * The strategy used to confine each task inside its bandwidth reservation
+ * is the Constant Bandwidth Server (CBS) scheduling, a slight variation on
+ * EDF that makes this possible.
+ *
+ * Correct behavior, i.e., no EDF task misses its deadline, is only
+ * guaranteed if the parameters of all the EDF entities in the system are
+ * assigned so to result in a schedulable solution.
+ * However, thanks to bandwidth isolation, overruns and deadline misses
+ * remains local to a single task, and does not affect any other task in
+ * the system.
+ *
+ * Groups, if configured, have bandwidth as well, and it is enforced that
+ * the sum of the bandwidths of entities (tasks and groups) belonging to
+ * a group stays below its own bandwidth.
+ *
+ * Copyright (C) 2009 Dario Faggioli, Michael Trimarchi
+ */
+
+static const struct sched_class edf_sched_class;
+
+static inline struct task_struct *edf_task_of(struct sched_edf_entity *edf_se)
+{
+ return container_of(edf_se, struct task_struct, edf);
+}
+
+static inline struct rq *rq_of_edf_rq(struct edf_rq *edf_rq)
+{
+ return container_of(edf_rq, struct rq, edf);
+}
+
+static inline struct edf_rq *edf_rq_of_se(struct sched_edf_entity *edf_se)
+{
+ struct task_struct *p = edf_task_of(edf_se);
+ struct rq *rq = task_rq(p);
+
+ return &rq->edf;
+}
+
+static inline int edf_se_boosted(struct sched_edf_entity *edf_se)
+{
+ struct task_struct *p = edf_task_of(edf_se);
+
+ return p->prio != p->normal_prio;
+}
+
+static inline int on_edf_rq(struct sched_edf_entity *edf_se)
+{
+ return !RB_EMPTY_NODE(&edf_se->rb_node);
+}
+
+#define for_each_leaf_edf_rq(edf_rq, rq) \
+ for (edf_rq = &rq->edf; edf_rq; edf_rq = NULL)
+
+static inline int edf_time_before(u64 a, u64 b)
+{
+ return (s64)(a - b) < 0;
+}
+
+static inline u64 edf_max_deadline(u64 a, u64 b)
+{
+ s64 delta = (s64)(b - a);
+ if (delta > 0)
+ a = b;
+
+ return a;
+}
+
+static inline void __edf_new_to_running(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+
+ edf_se->edf_flags &= ~EDF_NEW;
+ edf_se->edf_flags |= EDF_RUNNING;
+
+ edf_se->deadline = rq->clock + edf_se->period;
+}
+
+static inline void __edf_clear_ended(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+
+ edf_se->edf_flags &= ~EDF_ENDED;
+
+ edf_se->runtime = edf_se->runtime > 0 ? 0 : edf_se->runtime;
+ if (edf_time_before(edf_se->deadline, rq->clock))
+ edf_se->deadline = rq->clock;
+}
+
+static inline void __edf_set_ended(struct sched_edf_entity *edf_se)
+{
+ edf_se->edf_flags |= EDF_ENDED;
+}
+
+static void enqueue_edf_entity(struct sched_edf_entity *edf_se);
+static void dequeue_edf_entity(struct sched_edf_entity *edf_se);
+
+static void update_edf_params(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+ u64 left, right;
+
+ BUG_ON(on_edf_rq(edf_se));
+
+ if (edf_se->edf_flags & EDF_NEW) {
+ __edf_new_to_running(edf_se);
+ return;
+ }
+ if (edf_se->edf_flags & EDF_ENDED) {
+ __edf_clear_ended(edf_se);
+ goto update;
+ }
+
+ /*
+ * Update the deadline to the current time only if:
+ * - the budget has been completely exhausted;
+ * - using the ramaining budget, with the current deadline, would
+ * make the task exceed its bandwidth;
+ * - the deadline itself is in the past.
+ *
+ * For the second condition to hold, we check if:
+ * runtime / (deadline - rq->clock) >= runtime_max / period
+ *
+ * Which basically says if, in the time left before the current
+ * deadline, the tasks overcome its expected runtime by using the
+ * residual budget (left and right are the two sides of the equation,
+ * after a bit of shuffling to use multiplications instead of
+ * divisions).
+ */
+ left = edf_se->runtime > 0 ? edf_se->period * edf_se->runtime : 0;
+ right = (edf_se->deadline - rq->clock) * edf_se->runtime_max;
+
+ if (edf_se->runtime < 0 || edf_time_before(right, left) ||
+ edf_time_before(edf_se->deadline, rq->clock)) {
+ edf_se->deadline = rq->clock;
+update:
+ /*
+ * Keep moving the deadline and replenishing runtime by the
+ * proper amount at least until the runtime becomes positive.
+ */
+ while (edf_se->runtime <= 0) {
+ edf_se->deadline += edf_se->period;
+ if (edf_se->runtime + edf_se->runtime_max < edf_se->runtime_max)
+ edf_se->runtime += edf_se->runtime_max;
+ else
+ edf_se->runtime = edf_se->runtime_max;
+ }
+ }
+}
+
+static int start_edf_timer(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+ ktime_t now, delta, act;
+ ktime_t soft, hard;
+ unsigned long range;
+
+ now = hrtimer_cb_get_time(&edf_se->edf_timer);
+ delta = ktime_sub_ns(now, rq->clock);
+ act = ns_to_ktime(edf_se->deadline);
+ act = ktime_add(act, delta);
+
+ hrtimer_set_expires(&edf_se->edf_timer, act);
+
+ soft = hrtimer_get_softexpires(&edf_se->edf_timer);
+ hard = hrtimer_get_expires(&edf_se->edf_timer);
+ range = ktime_to_ns(ktime_sub(hard, soft));
+ __hrtimer_start_range_ns(&edf_se->edf_timer, soft,
+ range, HRTIMER_MODE_ABS, 0);
+
+ return hrtimer_active(&edf_se->edf_timer);
+}
+
+static enum hrtimer_restart edf_timer(struct hrtimer *timer)
+{
+ struct sched_edf_entity *edf_se = container_of(timer,
+ struct sched_edf_entity,
+ edf_timer);
+ struct task_struct *p = edf_task_of(edf_se);
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rq *rq = rq_of_edf_rq(edf_rq);
+
+ spin_lock(&rq->lock);
+ if (!edf_task(p))
+ goto unlock;
+
+ update_edf_params(edf_se);
+ if (p->se.on_rq && !on_edf_rq(edf_se)) {
+ enqueue_edf_entity(edf_se);
+ resched_task(rq->curr);
+ }
+unlock:
+ spin_unlock(&rq->lock);
+
+ return HRTIMER_NORESTART;
+}
+
+static void init_edf_timer(struct hrtimer *timer)
+{
+ if (hrtimer_active(timer)) {
+ hrtimer_try_to_cancel(timer);
+ return;
+ }
+
+ hrtimer_init(timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ timer->function = edf_timer;
+}
+
+static int edf_runtime_exceeded(struct sched_edf_entity *edf_se)
+{
+ if (edf_se->runtime > 0 || edf_se_boosted(edf_se))
+ return 0;
+
+ BUG_ON(hrtimer_active(&edf_se->edf_timer));
+
+ dequeue_edf_entity(edf_se);
+ if (!start_edf_timer(edf_se)) {
+ update_edf_params(edf_se);
+ enqueue_edf_entity(edf_se);
+ }
+
+ return 1;
+}
+
+static void update_curr_edf(struct rq *rq)
+{
+ struct task_struct *curr = rq->curr;
+ struct sched_edf_entity *edf_se = &curr->edf;
+ u64 delta_exec;
+
+ if (!edf_task(curr) || !on_edf_rq(edf_se))
+ return;
+
+ delta_exec = rq->clock - curr->se.exec_start;
+ if (unlikely((s64)delta_exec < 0))
+ delta_exec = 0;
+
+ schedstat_set(curr->se.exec_max, max(curr->se.exec_max, delta_exec));
+
+ curr->se.sum_exec_runtime += delta_exec;
+ account_group_exec_runtime(curr, delta_exec);
+
+ curr->se.exec_start = rq->clock;
+ cpuacct_charge(curr, delta_exec);
+
+ edf_se->runtime -= delta_exec;
+ /*
+ * If we exhausted our runtime it has to be replenished
+ * either immediately or after one (or more) CBS period(s).
+ */
+ if (edf_runtime_exceeded(edf_se))
+ resched_task(curr);
+}
+
+static void enqueue_edf_entity(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+ struct rb_node **link = &edf_rq->rb_root.rb_node;
+ struct rb_node *parent = NULL;
+ struct sched_edf_entity *entry;
+ int leftmost = 1;
+
+ BUG_ON(!RB_EMPTY_NODE(&edf_se->rb_node));
+
+ while (*link) {
+ parent = *link;
+ entry = rb_entry(parent, struct sched_edf_entity, rb_node);
+ if (!edf_time_before(entry->deadline, edf_se->deadline))
+ link = &parent->rb_left;
+ else {
+ link = &parent->rb_right;
+ leftmost = 0;
+ }
+ }
+
+ if (leftmost)
+ edf_rq->rb_leftmost = &edf_se->rb_node;
+
+ rb_link_node(&edf_se->rb_node, parent, link);
+ rb_insert_color(&edf_se->rb_node, &edf_rq->rb_root);
+
+ edf_rq->edf_nr_running++;
+}
+
+static void dequeue_edf_entity(struct sched_edf_entity *edf_se)
+{
+ struct edf_rq *edf_rq = edf_rq_of_se(edf_se);
+
+ if (RB_EMPTY_NODE(&edf_se->rb_node))
+ return;
+
+ if (edf_rq->rb_leftmost == &edf_se->rb_node) {
+ struct rb_node *next_node;
+ struct sched_edf_entity *next;
+
+ next_node = rb_next(&edf_se->rb_node);
+ edf_rq->rb_leftmost = next_node;
+
+ if (next_node)
+ next = rb_entry(next_node, struct sched_edf_entity,
+ rb_node);
+ }
+
+ rb_erase(&edf_se->rb_node, &edf_rq->rb_root);
+ RB_CLEAR_NODE(&edf_se->rb_node);
+
+ edf_rq->edf_nr_running--;
+}
+
+static void check_preempt_curr_edf(struct rq *rq, struct task_struct *p,
+ int sync)
+{
+ if (unlikely(rt_prio(p->prio)))
+ goto resched;
+
+ if (unlikely(p->sched_class != &edf_sched_class))
+ return;
+
+ if ((s64)p->edf.deadline - rq->curr->edf.deadline < 0)
+resched:
+ resched_task(rq->curr);
+}
+
+static void
+enqueue_task_edf(struct rq *rq, struct task_struct *p, int wakeup)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ BUG_ON(on_edf_rq(edf_se));
+
+ /*
+ * Only enqueue entities with some remaining runtime.
+ */
+ if (edf_se->runtime <= 0)
+ return;
+
+ update_edf_params(edf_se);
+ enqueue_edf_entity(edf_se);
+ check_preempt_curr_edf(rq, p, 0);
+}
+
+static void
+dequeue_task_edf(struct rq *rq, struct task_struct *p, int sleep)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ if (!on_edf_rq(edf_se))
+ return;
+
+ update_curr_edf(rq);
+ dequeue_edf_entity(edf_se);
+}
+
+static void yield_task_edf(struct rq *rq)
+{
+ struct task_struct *p = rq->curr;
+ struct sched_edf_entity *edf_se = &p->edf;
+
+ __edf_set_ended(edf_se);
+}
+
+#ifdef CONFIG_SCHED_HRTICK
+static void start_hrtick_edf(struct rq *rq, struct task_struct *p)
+{
+ struct sched_edf_entity *edf_se = &p->edf;
+ s64 delta;
+
+ delta = edf_se->runtime_max - edf_se->runtime;
+
+ if (delta > 10000)
+ hrtick_start(rq, delta);
+}
+#else
+static void start_hrtick_edf(struct rq *rq, struct task_struct *p)
+{
+}
+#endif
+
+static struct sched_edf_entity *__pick_edf_last_entity(struct edf_rq *edf_rq)
+{
+ struct rb_node *last = rb_last(&edf_rq->rb_root);
+
+ if (!last)
+ return NULL;
+
+ return rb_entry(last, struct sched_edf_entity, rb_node);
+}
+
+static struct sched_edf_entity *pick_next_edf_entity(struct rq *rq,
+ struct edf_rq *edf_rq)
+{
+ struct rb_node *left = edf_rq->rb_leftmost;
+
+ if (!left)
+ return NULL;
+
+ return rb_entry(left, struct sched_edf_entity, rb_node);
+}
+
+struct task_struct *pick_next_task_edf(struct rq *rq)
+{
+ struct sched_edf_entity *edf_se;
+ struct edf_rq *edf_rq = &rq->edf;
+ struct task_struct *p;
+
+ BUG_ON(edf_rq->edf_nr_running < 0);
+
+ if (!edf_rq->edf_nr_running)
+ return NULL;
+
+ edf_se = pick_next_edf_entity(rq, edf_rq);
+ BUG_ON(!edf_se);
+
+ p = edf_task_of(edf_se);
+ p->se.exec_start = rq->clock;
+ if (hrtick_enabled(rq))
+ start_hrtick_edf(rq, p);
+
+ return p;
+}
+
+static void put_prev_task_edf(struct rq *rq, struct task_struct *p)
+{
+ update_curr_edf(rq);
+ p->se.exec_start = 0;
+}
+
+static void task_tick_edf(struct rq *rq, struct task_struct *p, int queued)
+{
+ update_curr_edf(rq);
+}
+
+static void set_curr_task_edf(struct rq *rq)
+{
+ struct task_struct *p = rq->curr;
+
+ p->se.exec_start = rq->clock;
+}
+
+static void prio_changed_edf(struct rq *rq, struct task_struct *p,
+ int oldprio, int running)
+{
+ if (running) {
+ if (p->prio > oldprio)
+ resched_task(rq->curr);
+ } else
+ check_preempt_curr_edf(rq, p, 0);
+}
+
+static void switched_to_edf(struct rq *rq, struct task_struct *p,
+ int running)
+{
+ if (!running)
+ check_preempt_curr_edf(rq, p, 0);
+}
+
+#ifdef CONFIG_SMP
+static int select_task_rq_edf(struct task_struct *p, int sd_flag, int flags)
+{
+ return task_cpu(p);
+}
+
+static unsigned long
+load_balance_edf(struct rq *this_rq, int this_cpu, struct rq *busiest,
+ unsigned long max_load_move,
+ struct sched_domain *sd, enum cpu_idle_type idle,
+ int *all_pinned, int *this_best_prio)
+{
+ /* for now, don't touch EDF tasks */
+ return 0;
+}
+
+static int
+move_one_task_edf(struct rq *this_rq, int this_cpu, struct rq *busiest,
+ struct sched_domain *sd, enum cpu_idle_type idle)
+{
+ return 0;
+}
+
+static void set_cpus_allowed_edf(struct task_struct *p,
+ const struct cpumask *new_mask)
+{
+ int weight = cpumask_weight(new_mask);
+
+ BUG_ON(!edf_task(p));
+
+ cpumask_copy(&p->cpus_allowed, new_mask);
+ p->edf.nr_cpus_allowed = weight;
+}
+#endif
+
+static const struct sched_class edf_sched_class = {
+ .next = &fair_sched_class,
+ .enqueue_task = enqueue_task_edf,
+ .dequeue_task = dequeue_task_edf,
+ .yield_task = yield_task_edf,
+
+ .check_preempt_curr = check_preempt_curr_edf,
+
+ .pick_next_task = pick_next_task_edf,
+ .put_prev_task = put_prev_task_edf,
+
+#ifdef CONFIG_SMP
+ .select_task_rq = select_task_rq_edf,
+
+ .load_balance = load_balance_edf,
+ .move_one_task = move_one_task_edf,
+ .set_cpus_allowed = set_cpus_allowed_edf,
+#endif
+
+ .set_curr_task = set_curr_task_edf,
+ .task_tick = task_tick_edf,
+
+ .prio_changed = prio_changed_edf,
+ .switched_to = switched_to_edf,
+};
+
+#ifdef CONFIG_SCHED_DEBUG
+void print_edf_rq(struct seq_file *m, int cpu, struct edf_rq *edf_rq);
+
+static void print_edf_stats(struct seq_file *m, int cpu)
+{
+ struct edf_rq *edf_rq = &cpu_rq(cpu)->edf;
+
+ rcu_read_lock();
+ for_each_leaf_edf_rq(edf_rq, cpu_rq(cpu))
+ print_edf_rq(m, cpu, edf_rq);
+ rcu_read_unlock();
+}
+#endif /* CONFIG_SCHED_DEBUG */
+
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index ecc637a..cae7fb8 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1571,10 +1571,8 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_

update_curr(cfs_rq);

- if (unlikely(rt_prio(p->prio))) {
+ if (rt_prio(p->prio) || task_has_edf_policy(p))
resched_task(curr);
- return;
- }

if (unlikely(p->sched_class != &fair_sched_class))
return;
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index a4d790c..4511fc9 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1746,7 +1746,7 @@ unsigned int get_rr_interval_rt(struct task_struct *task)
}

static const struct sched_class rt_sched_class = {
- .next = &fair_sched_class,
+ .next = &edf_sched_class,
.enqueue_task = enqueue_task_rt,
.dequeue_task = dequeue_task_rt,
.yield_task = yield_task_rt,
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 26185d7..d32aaf0 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -24,6 +24,7 @@ static int __read_mostly tracer_enabled;

static struct task_struct *wakeup_task;
static int wakeup_cpu;
+static int wakeup_edf;
static int wakeup_current_cpu;
static unsigned wakeup_prio = -1;
static int wakeup_rt;
@@ -214,6 +215,9 @@ probe_wakeup(struct rq *rq, struct task_struct *p, int success)
tracing_record_cmdline(p);
tracing_record_cmdline(current);

+ if (wakeup_edf && !edf_task(p))
+ return;
+
if ((wakeup_rt && !rt_task(p)) ||
p->prio >= wakeup_prio ||
p->prio >= current->prio)
@@ -340,12 +344,21 @@ static int __wakeup_tracer_init(struct trace_array *tr)

static int wakeup_tracer_init(struct trace_array *tr)
{
+ wakeup_edf = 0;
+ wakeup_rt = 0;
+ return __wakeup_tracer_init(tr);
+}
+
+static int wakeup_edf_tracer_init(struct trace_array *tr)
+{
+ wakeup_edf = 1;
wakeup_rt = 0;
return __wakeup_tracer_init(tr);
}

static int wakeup_rt_tracer_init(struct trace_array *tr)
{
+ wakeup_edf = 0;
wakeup_rt = 1;
return __wakeup_tracer_init(tr);
}
@@ -384,6 +397,20 @@ static struct tracer wakeup_tracer __read_mostly =
#endif
};

+static struct tracer wakeup_edf_tracer __read_mostly =
+{
+ .name = "wakeup_edf",
+ .init = wakeup_edf_tracer_init,
+ .reset = wakeup_tracer_reset,
+ .start = wakeup_tracer_start,
+ .stop = wakeup_tracer_stop,
+ .wait_pipe = poll_wait_pipe,
+ .print_max = 1,
+#ifdef CONFIG_FTRACE_SELFTEST
+ .selftest = trace_selftest_startup_wakeup,
+#endif
+};
+
static struct tracer wakeup_rt_tracer __read_mostly =
{
.name = "wakeup_rt",
@@ -406,6 +433,10 @@ __init static int init_wakeup_tracer(void)
if (ret)
return ret;

+ ret = register_tracer(&wakeup_edf_tracer);
+ if (ret)
+ return ret;
+
ret = register_tracer(&wakeup_rt_tracer);
if (ret)
return ret;


--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-23 14:08:15

by Linus Walleij

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

Hi Claudio, thanks for the detailed answer!

2009/9/23 Claudio Scordino <[email protected]>:

> With EDF, instead, the test is much easier: if the bandwidth is less than
> 100%, then you are sure that the deadlines will be met. In other words, you
> can fully exploit processor bandwidth up to 100% being sure that timing
> constraints of EDF tasks will be guaranteed.

This must be under the assumption that all code running in the system
is preemtible, is it not? Which means, negligible or close to negligible
non-preemptible code.

How much as we might like it to be, that is not (yet) the situation
with the Linux kernel, far off.

So we will still very much need the CONFIG_PREEMPT_RT patch
to get closer to that ideal situation, and we still need to get rid of
the BKL.

Well I guess you aldready know that, but take it as a note to
the bystanders of this discussion, which are plenty.

Now I ought to write fewer mails and look into solving that little
problem... You don't happen to know if we can have a EU FP7
project sponsored to rid out the BKL and switch and test drivers
en masse to use threaded interrupt handlers do you? ;-)

Linus Walleij

2009-09-23 14:43:16

by Daniel Walker

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Wed, 2009-09-23 at 14:22 +0200, Ingo Molnar wrote:
> He should consider not sending them at all. It's up to maintainers and
> the developers involved with that code whether the small details that
> checkpatch flags are important or not at a given point in the patch
> cycle.
>
> For example i use checkpatch all the time and i think it's a fantastic
> tool, still i dont want another nuisance layer on lkml interfering with
> the patch flow.
>
> If a patch gets so far in the patch cycle that i'm thinking about
> merging it, i might fix the checkpatch failures myself (often they are
> trivial), and i might warn frequent contributors about repeat patterns
> of small uncleanlinesses - or i might bounce the patch back to the
> contributor. I also ignore certain classes of checkpatch warnings.
>
> What Daniel is doing is basically a semi-mandatory checkpatch layer on
> lkml and that's both a distraction and harmful as well. We dont need a
> "checkpatch police" on lkml. We want an open, reasonable, human driven
> patch process with very little buerocracy and no buerocrats.

I think short term you might be right, that it is a nuisance to deal
with these issues.. However, these are real code comments which is what
this list is designed for.. Long term I don't think I will be sending
many of these emails at all, in fact I've only been doing this 3 weeks
and I can already see a drop off in the number of errors that I'm
finding.. It's like advertising, as soon as people start seeing a lot of
checkpatch related emails, they start to remember to use the tool.

Not to mention that automated code review (in mass) is useful .. Our
eyes can miss things, and having a massively used tool that checks for
all the common problems that we encounter is a good thing.. For
instance, checkpatch already found a locked semaphore, and a mutex type
semaphore in the "Target_Core_Mod ConfigFS infrastructure", which I'm
sure no one would want to enter the kernel, but had been missed. It also
found one real code defect,

http://lkml.indiana.edu/hypermail/linux/kernel/0909.1/00129.html

The more we use the tool the better the tool becomes, and the more real
problems can be caught prior to code inclusion ..

I could have a higher threshold for when these errors become note
worthy, and I've been struggling with that since I started doing this..
I don't think not commenting at all would be a good thing..

Daniel

2009-09-23 14:45:17

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Wed, 2009-09-23 at 16:08 +0200, Linus Walleij wrote:
> > With EDF, instead, the test is much easier: if the bandwidth is less than
> > 100%, then you are sure that the deadlines will be met. In other words, you
> > can fully exploit processor bandwidth up to 100% being sure that timing
> > constraints of EDF tasks will be guaranteed.
>
> This must be under the assumption that all code running in the system
> is preemtible, is it not? Which means, negligible or close to negligible
> non-preemptible code.
>
Right. Moreover, this also happen only if tasks are independent, do not
share resources, etc. which all are things that do not apply to Linux...
This is more than true!

However, this holds for any scheduler you may decide to consider,
especially if a real-time one.
A scheduling test is always based on a task model and, at least, with a
simple --completely preemptible, independent tasks-- model, EDF is
simpler than other solutions.

Obviously, if the model gets more complicated, more complicated analysis
has to be used, and again, this happen to all the schedulers you will
try to analyze.

> How much as we might like it to be, that is not (yet) the situation
> with the Linux kernel, far off.
>
> So we will still very much need the CONFIG_PREEMPT_RT patch
> to get closer to that ideal situation, and we still need to get rid of
> the BKL.
>
Independently from the above, I agree with you here, and in fact a
SCHED_EDF patch on top of PREEMPT_RT will hopefully come shortly, we and
some other guys are already at work on this! ;-P

> Now I ought to write fewer mails and look into solving that little
> problem... You don't happen to know if we can have a EU FP7
> project sponsored to rid out the BKL and switch and test drivers
> en masse to use threaded interrupt handlers do you? ;-)
>
Well, it's an interesting (applied) research topic, and I think there
still is a call for FP7 EU fundings! :-)

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-23 14:51:06

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Wed, 2009-09-23 at 14:25 +0200, Ingo Molnar wrote:
> * Avi Kivity <[email protected]> wrote:
>
> >> discouraging contributions is more something that happens when you
> >> get the responses I got earlier in this thread..
> >
> > That's probably intentional. Whitespace fixes have their place but
> > not at this stage in a patch's lifecycle.
>
> Exactly. What might make sense is to scan linux-next for new commits
> that show serious cleanliness trouble - and send fix patches to the
> parties involved. That's a real effort and brings the code forward.

Often times when a patch is at youngest that when you want to catch
these issues .. This EDF patch will likely get submitted more than
twice. If you catch all the minor problems first you will not be dealing
with them later when it comes time to include the code.

In this case the author is not totally aware of how to submit this
code.. I don't think it's at all inappropriate to comment on that. His
next submission will likely be much cleaner and nicer. It may even speed
up the inclusion process since he'll be more easily able to submit the
code (with practice and comments from us).

Daniel


2009-09-23 14:59:30

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On 09/23/2009 05:50 PM, Daniel Walker wrote:
> On Wed, 2009-09-23 at 14:25 +0200, Ingo Molnar wrote:
>
>> * Avi Kivity<[email protected]> wrote:
>>
>>
>>>> discouraging contributions is more something that happens when you
>>>> get the responses I got earlier in this thread..
>>>>
>>> That's probably intentional. Whitespace fixes have their place but
>>> not at this stage in a patch's lifecycle.
>>>
>> Exactly. What might make sense is to scan linux-next for new commits
>> that show serious cleanliness trouble - and send fix patches to the
>> parties involved. That's a real effort and brings the code forward.
>>
> Often times when a patch is at youngest that when you want to catch
> these issues .. This EDF patch will likely get submitted more than
> twice. If you catch all the minor problems first you will not be dealing
> with them later when it comes time to include the code.
>

Not true, you want to address the major issues first. What's the point
of fixing whitespace if the whole approach is rejected? if it has to
undergo a rewrite? (not an opinion on EDF btw, just as an example)

> In this case the author is not totally aware of how to submit this
> code.. I don't think it's at all inappropriate to comment on that. His
> next submission will likely be much cleaner and nicer. It may even speed
> up the inclusion process since he'll be more easily able to submit the
> code (with practice and comments from us).
>

Give people some credit.

--
error compiling committee.c: too many arguments to function

2009-09-23 15:08:59

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Wed, 2009-09-23 at 17:58 +0300, Avi Kivity wrote:
> On 09/23/2009 05:50 PM, Daniel Walker wrote:
> > On Wed, 2009-09-23 at 14:25 +0200, Ingo Molnar wrote:
> >
> >> * Avi Kivity<[email protected]> wrote:
> >>
> >>
> >>>> discouraging contributions is more something that happens when you
> >>>> get the responses I got earlier in this thread..
> >>>>
> >>> That's probably intentional. Whitespace fixes have their place but
> >>> not at this stage in a patch's lifecycle.
> >>>
> >> Exactly. What might make sense is to scan linux-next for new commits
> >> that show serious cleanliness trouble - and send fix patches to the
> >> parties involved. That's a real effort and brings the code forward.
> >>
> > Often times when a patch is at youngest that when you want to catch
> > these issues .. This EDF patch will likely get submitted more than
> > twice. If you catch all the minor problems first you will not be dealing
> > with them later when it comes time to include the code.
> >
>
> Not true, you want to address the major issues first. What's the point
> of fixing whitespace if the whole approach is rejected? if it has to
> undergo a rewrite? (not an opinion on EDF btw, just as an example)

I'm not sure why your fixated on whitespace , but thinking about it more
I don't think it matters .. If you fix whitespace or major issues first,
it doesn't matter .. All the issues have to eventually get fixed .. Not
to mentioned that LKML is not something you could remotely control in
that way.

> > In this case the author is not totally aware of how to submit this
> > code.. I don't think it's at all inappropriate to comment on that. His
> > next submission will likely be much cleaner and nicer. It may even speed
> > up the inclusion process since he'll be more easily able to submit the
> > code (with practice and comments from us).
> >
>
> Give people some credit.

What do you mean?

Daniel

2009-09-23 15:12:59

by Avi Kivity

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On 09/23/2009 06:08 PM, Daniel Walker wrote:
>
>> Not true, you want to address the major issues first. What's the point
>> of fixing whitespace if the whole approach is rejected? if it has to
>> undergo a rewrite? (not an opinion on EDF btw, just as an example)
>>
> I'm not sure why your fixated on whitespace , but thinking about it more
> I don't think it matters .. If you fix whitespace or major issues first,
> it doesn't matter .. All the issues have to eventually get fixed .. Not
> to mentioned that LKML is not something you could remotely control in
> that way.
>

A technical issue is that if you rewrite the code the whitespace fix
becomes irrelevant. But more important is that it's a distraction when
people are thinking about requirements and design.

>>> In this case the author is not totally aware of how to submit this
>>> code.. I don't think it's at all inappropriate to comment on that. His
>>> next submission will likely be much cleaner and nicer. It may even speed
>>> up the inclusion process since he'll be more easily able to submit the
>>> code (with practice and comments from us).
>>>
>>>
>> Give people some credit.
>>
> What do you mean?
>
>

If he's able to write a scheduling class, he'll pick up the coding style
when it becomes relevant.

--
error compiling committee.c: too many arguments to function

2009-09-23 15:24:09

by Daniel Walker

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Wed, 2009-09-23 at 18:12 +0300, Avi Kivity wrote:
> On 09/23/2009 06:08 PM, Daniel Walker wrote:
> >
> >> Not true, you want to address the major issues first. What's the point
> >> of fixing whitespace if the whole approach is rejected? if it has to
> >> undergo a rewrite? (not an opinion on EDF btw, just as an example)
> >>
> > I'm not sure why your fixated on whitespace , but thinking about it more
> > I don't think it matters .. If you fix whitespace or major issues first,
> > it doesn't matter .. All the issues have to eventually get fixed .. Not
> > to mentioned that LKML is not something you could remotely control in
> > that way.
> >
>
> A technical issue is that if you rewrite the code the whitespace fix
> becomes irrelevant. But more important is that it's a distraction when
> people are thinking about requirements and design.

It's not irrelevant, since a person doing that rewrite will be
conscience of whitespace during the re-write ..

The same with general coding style, if someone does a rewrite who has
been alerted to checkpatch problems they will likely use it themselves
leaving no need for someone else to comment on it.

> >>> In this case the author is not totally aware of how to submit this
> >>> code.. I don't think it's at all inappropriate to comment on that. His
> >>> next submission will likely be much cleaner and nicer. It may even speed
> >>> up the inclusion process since he'll be more easily able to submit the
> >>> code (with practice and comments from us).
> >>>
> >>>
> >> Give people some credit.
> >>
> > What do you mean?
> >
> >
>
> If he's able to write a scheduling class, he'll pick up the coding style
> when it becomes relevant.

There's plenty of large projects that never get off the ground cause
people don't follow the coding style, or don't write clean code.. Take a
look at the staging tree there's plenty of large dirty projects in
there.

Daniel

2009-09-23 19:24:14

by Andy Isaacson

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Tue, Sep 22, 2009 at 06:11:39PM -0700, Daniel Walker wrote:
> On Tue, 2009-09-22 at 18:01 -0700, Joe Perches wrote:
> > I think it'd be more helpful if instead of merely sending
> > a "hah! checkpatch failure!" email you send the submitter
> > a gentler nudge and a fixed patch.
>
> I don't say "hah!" like I caught you or something .. I could try to

It does come across that way sometimes.

> soften the emails further so that's not the feeling .. As far as sending
> fixed patches to people it's not really feasible due to the number of
> patches I would have to send out..

I'd rather see 10% coverage with useful patches than 100% coverage of
"there is some problem in this patch but I can't be bothered to say
what".

> The other issues is that if I send a corrected patch I could become
> responsible for that persons patch instead of them ..

Not very likely unless you take that burden on yourself.

> I could send out delta patches, however, I don't want people to rely
> on me to fix these issues.. Ideally people would correct them prior to
> sending the patches out..

Sure, and checkpatch is useful for that -- and more helper scripts to
automate the process would be helpful. For example, extending "git
send-email" to have an optional "warn about checkpatch failures" would
be a useful addition, especially if it's not too annoying. (I find the
current checkpatch output somewhat annoying, so just including that
wouldn't be great IMO.)

I would strongly recommend that you redistribute the effort you're
putting in, and only send messages where you have non-trivial input to
the development process. For example, saying "This patch has checkpatch
errors. Please fix the indentation and the EXPORT_SYMBOL errors, and
move function declarations to foo.h" would be much much more useful than
"it looks like you have a few style issues".

Let's go through a few of your recent checkpatch messages:

Message-Id: <1252936556.28368.215.camel@desktop>
> Could you run this through checkpatch also, it looks like you have a few
> style issues..

Patch <[email protected]> is a
strict improvement on the existing code, and changing the comma spacing
on just these two lines to silence checkpatch would make the file
inconsistent. Changing all 12 functionlike #defines in this file would
be outside the purview of this patch. You could have done the robotlike
patch and fixed the issue in this file in the time it would have taken
to do this analysis, and then you would have seen that the file has far
worse issues than the trivial ones checkpatch flagged in Felipe's patch.

Verdict: useless.

Message-Id: <1252465099.14793.49.camel@desktop>
> All the lines like the above are all producing checkpatch errors.. It
> looks like the open brace needs to be up with the equals ..

Joe is correct in his response, in this case the "error" from checkpatch
is ignorable. Your responses in <1252467842.14793.66.camel@desktop> and
<1252466482.14793.60.camel@desktop> are dismissive of exactly the
concerns people are raising here, and you badly misstate your case when
you say
> I would if this was code in the kernel already, but it's not.
without apologizing when your mistake was pointed out.

Verdict: useless, and abrasive in your response to correction.

Message-Id: <1253553472.9654.236.camel@desktop>
One right (and reasonable to say in text although a patch would have
been equally reasonable, and hardly any more work to generate), one
wrong. I strongly wouldn't have bothered to send this message.

Verdict: mediocre.

Message-Id: <1253504435.9654.221.camel@desktop>
AFAICS you didn't even read the checkpatch output, just saw a lot of
output and reflexively sent email. Obviously the 0xA0 is due to some
sort of editor/mailer issue not a style issue.

Verdict: useless.

Message-Id: <1253326790.6699.38.camel@desktop>
Reasonable, although you'd do better to lead with the reasonable comment
and then follow with a mention of chckpatch rather than the other way
around.

Verdict: useful.

Message-Id: <1253293100.7060.46.camel@desktop>
Whitespace errors in a delta (as opposed to a new submission, especially
from a new contributor) are rarely worth whingeing about, and in this
case it's merely duplicating the error on the line above the diff.

Verdict: mediocre.

Message-Id: <1253291944.7060.42.camel@desktop>
Saying "we use checkpatch.pl" to someone who's been contributing since
before Git existed is condescending in the extreme -- it'd be bad enough
to say it to a first-time contributor, but this is just beyond the pale.
The checkpatch output is fairly useful and clearly Lennart did find the
reminder useful, but you could have been less abrasive by putting in a
little more work.

Verdict: useful.

Message-Id: <1253092251.11643.569.camel@desktop>
davem already gave useful commentary 2 hours before your mail, so I
wouldn't have bothered to send your mail. If you do send this, it would
be more useful to say "You seem to use spaces for indentation
everywhere, could you fix that and run checkpatch before re-submitting".

Verdict: technically useful but socially abrasive.


In summary, of 8 messages reviewed that's 3 completely useless messages,
2 of middling use, and 3 useful ones. Of that, two or three times you
were strongly abrasive IMO.

That's about the proportion I've anecdotally observed in your checkpatch
mails (a bit more positive than I would have guessed actually). If
you'd sent only three of those eight messages, I think you'd be getting
much less negative feedback, and you'd have had 166% more time to spend
on each message.

FWIW, I think you have a valuable contribution but it's being drowned
out by your errors and non-useful messages. Please be more careful to
not send erroneous messages, contribute patches to address (even just
some of!) the errors you flag, graciously acknowledge your errors when
you're called on them (and learn from the corrections!), and perhaps
people will lay off on you.

-andy

2009-09-23 21:39:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class


On Tue, 22 Sep 2009, Raistlin wrote:

> I see. Problem is that last time I sent a patch, I inlined it and it
> worked without any issue... It seems something has changed either in me
> or my MUA... Too bad. :-(
>
> > I realize
> > this is an RFC , so the next time you submit this try to make it utf8 ..
> >
> You're definitely right and, now that you pointed this out to me, I'm
> not able to produce a clean version, even if I set utf8 encoding...
>
> So, I'm sorry, but, for now, the only thing I can do is attaching the
> patch, hoping it works. The next one will be stop using Evolution! :-/
>

Hmm, I use evolution with no issues with patches. I just set it to
Preformatted (from Normal), and then hit "Insert->Text File ..." and it
inlines quite nicely.

/me says this while sending via pine ;-)

-- Steve

2009-09-24 00:34:55

by Geunsik Lim

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, Sep 22, 2009 at 7:30 PM, Raistlin <[email protected]> wrote:
>
> The project is hosted at the following URL:
>
> http://www.evidence.eu.com/sched_edf.html
>
Hi Raistlin,

Thank you for challenge to support EDF on Linux 2.6 in private.

You have to modify git address of schedtool-edf at
http://www.evidence.eu.com/sched_edf.html correctly.

before) git clone git://feanor.sssup.it/sched-tool-edf.git
after ) git clone git://feanor.sssup.it/schedtool-edf.git

----- http://www.evidence.eu.com/sched_edf.html
---------------------------------
Usage with schedtool

To easily test SCHED_EDF you can use our modified version of
schedtool as explained below.
For instance, to run a simple yes program with period 100000us and
budget 25000us, just type the following commands:

git clone git://feanor.sssup.it/sched-tool-edf.git * <-- Here, change
to schedtool-edf.git
cd schedtool-edf
make
./schedtool -E -d 100000 -b 25000 -a 0 -e yes > /dev/null &



--
Regards,
GeunSik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
[email protected] , [email protected]

2009-09-24 00:58:20

by Geunsik Lim

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Thu, Sep 24, 2009 at 6:39 AM, Steven Rostedt <[email protected]> wrote:
>> So, I'm sorry, but, for now, the only thing I can do is attaching the
>> patch, hoping it works. The next one will be stop using Evolution! :-/
>>
>
> Hmm, I use evolution with no issues with patches. I just set it to
> Preformatted (from Normal), and then hit "Insert->Text File ..." and it
> inlines quite nicely.
>
> /me says this while sending via pine ;-)
> -- Steve
Me too.
I also evolution(ver.2.8.0) with no issues like Steven.
For example,
1) select "Preformat" select form in Compse Message window.
2) and then, hit "Insert - Test File ... "
That's all.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



--
Regards,
GeunSik Lim ( Samsung Electronics )
Blog : http://blog.naver.com/invain/
e-Mail: [email protected]
[email protected] , [email protected]

2009-09-24 06:08:29

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Thu, 2009-09-24 at 09:34 +0900, GeunSik Lim wrote:
> Hi Raistlin,
>
Hi,

> You have to modify git address of schedtool-edf at
> http://www.evidence.eu.com/sched_edf.html correctly.
>
> before) git clone git://feanor.sssup.it/sched-tool-edf.git
> after ) git clone git://feanor.sssup.it/schedtool-edf.git
>
Oh, I see... Many thanks, we'll fix this! :-)

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-24 09:11:12

by Claudio Scordino

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

GeunSik Lim ha scritto:
> On Tue, Sep 22, 2009 at 7:30 PM, Raistlin <[email protected]> wrote:
>
>> The project is hosted at the following URL:
>>
>> http://www.evidence.eu.com/sched_edf.html
>>
>>
> Hi Raistlin,
>
> Thank you for challenge to support EDF on Linux 2.6 in private.
>
> You have to modify git address of schedtool-edf at
> http://www.evidence.eu.com/sched_edf.html correctly.
>
> before) git clone git://feanor.sssup.it/sched-tool-edf.git
> after ) git clone git://feanor.sssup.it/schedtool-edf.git
>
> ----- http://www.evidence.eu.com/sched_edf.html
> ---------------------------------
> Usage with schedtool
>
> To easily test SCHED_EDF you can use our modified version of
> schedtool as explained below.
> For instance, to run a simple yes program with period 100000us and
> budget 25000us, just type the following commands:
>
> git clone git://feanor.sssup.it/sched-tool-edf.git * <-- Here, change
>
Fixed.

Many thanks,

Claudio

2009-09-24 14:58:31

by Daniel Walker

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Wed, 2009-09-23 at 12:24 -0700, Andy Isaacson wrote:
> On Tue, Sep 22, 2009 at 06:11:39PM -0700, Daniel Walker wrote:
> > On Tue, 2009-09-22 at 18:01 -0700, Joe Perches wrote:
> > > I think it'd be more helpful if instead of merely sending
> > > a "hah! checkpatch failure!" email you send the submitter
> > > a gentler nudge and a fixed patch.
> >
> > I don't say "hah!" like I caught you or something .. I could try to
>
> It does come across that way sometimes.
>
> > soften the emails further so that's not the feeling .. As far as sending
> > fixed patches to people it's not really feasible due to the number of
> > patches I would have to send out..
>
> I'd rather see 10% coverage with useful patches than 100% coverage of
> "there is some problem in this patch but I can't be bothered to say
> what".

Your kind of putting words in my mouth here.. I can be bothered to say
what the errors are, but most people have the tool already. The errors
are also fairly trivial. I'd rather people ran the tool themselves.

> > I could send out delta patches, however, I don't want people to rely
> > on me to fix these issues.. Ideally people would correct them prior to
> > sending the patches out..
>
> Sure, and checkpatch is useful for that -- and more helper scripts to
> automate the process would be helpful. For example, extending "git
> send-email" to have an optional "warn about checkpatch failures" would
> be a useful addition, especially if it's not too annoying. (I find the
> current checkpatch output somewhat annoying, so just including that
> wouldn't be great IMO.)

Yeah, git integration would be nice if it doesn't already exist. I've
never seen that feature tho. git does have a warn on trailing whitespace
tho ..

> I would strongly recommend that you redistribute the effort you're
> putting in, and only send messages where you have non-trivial input to
> the development process. For example, saying "This patch has checkpatch
> errors. Please fix the indentation and the EXPORT_SYMBOL errors, and
> move function declarations to foo.h" would be much much more useful than
> "it looks like you have a few style issues".

I could do that.. However, my main annoyance is the style issues. Many
of the patches that I have commented on I've reviewed to find other
issues that could be solved, but often times other than the style issues
the patches don't look all that bad..

> Let's go through a few of your recent checkpatch messages:
>
> Message-Id: <1252936556.28368.215.camel@desktop>
> > Could you run this through checkpatch also, it looks like you have a few
> > style issues..
>
> Patch <[email protected]> is a
> strict improvement on the existing code, and changing the comma spacing
> on just these two lines to silence checkpatch would make the file
> inconsistent. Changing all 12 functionlike #defines in this file would
> be outside the purview of this patch. You could have done the robotlike
> patch and fixed the issue in this file in the time it would have taken
> to do this analysis, and then you would have seen that the file has far
> worse issues than the trivial ones checkpatch flagged in Felipe's patch.
>
> Verdict: useless.

At the time, I didn't think people should use the style consistent with
the file. For instance, if you touch a line you should be responsible
for the clean up on that line.. However, I've since stopped asking
people to do that since it could be overwhelming, and there's many times
when people do automatic regex changes which have nothing to do with any
sort of clean up on the lines they have changed ..

> Message-Id: <1252465099.14793.49.camel@desktop>
> > All the lines like the above are all producing checkpatch errors.. It
> > looks like the open brace needs to be up with the equals ..
>
> Joe is correct in his response, in this case the "error" from checkpatch
> is ignorable. Your responses in <1252467842.14793.66.camel@desktop> and
> <1252466482.14793.60.camel@desktop> are dismissive of exactly the
> concerns people are raising here, and you badly misstate your case when
> you say
> > I would if this was code in the kernel already, but it's not.
> without apologizing when your mistake was pointed out.
>
> Verdict: useless, and abrasive in your response to correction.

I was dismissive here, however I still do think those lines deserved the
clean up.. The lines where being altered enough to expose a checkpatch
bug due to the attribute (I later submitted a fix for checkpatch) ..
Also realize that the patch touched several architectures, and it was
attached to a larger patch set which I think would have been the perfect
deliver for that clean up.

> Message-Id: <1253553472.9654.236.camel@desktop>
> One right (and reasonable to say in text although a patch would have
> been equally reasonable, and hardly any more work to generate), one
> wrong. I strongly wouldn't have bothered to send this message.
>
> Verdict: mediocre.

I hesitated to send it .. It's often difficult to justify sometimes, but
it's also annoying that people don't catch these trivial things.

> Message-Id: <1253504435.9654.221.camel@desktop>
> AFAICS you didn't even read the checkpatch output, just saw a lot of
> output and reflexively sent email. Obviously the 0xA0 is due to some
> sort of editor/mailer issue not a style issue.
>
> Verdict: useless.

I did read the errors, not all of them however.. Indentation issues can
cause some different types of checkpatch output, which to me didn't
initially point to a mailer problem .. This patch was totally worthless
how it was, so it need attention one way or another.

> Message-Id: <1253326790.6699.38.camel@desktop>
> Reasonable, although you'd do better to lead with the reasonable comment
> and then follow with a mention of chckpatch rather than the other way
> around.
>
> Verdict: useful.
>
> Message-Id: <1253293100.7060.46.camel@desktop>
> Whitespace errors in a delta (as opposed to a new submission, especially
> from a new contributor) are rarely worth whingeing about, and in this
> case it's merely duplicating the error on the line above the diff.
>
> Verdict: mediocre.

Are we talking about the same email from Paul? I had already sent him
some other checkpatch output in another thread, and he thanked for
catching it. So in this case I assumed he wouldn't mine. So I don't feel
too bad about having sent it.

> Message-Id: <1253291944.7060.42.camel@desktop>
> Saying "we use checkpatch.pl" to someone who's been contributing since
> before Git existed is condescending in the extreme -- it'd be bad enough
> to say it to a first-time contributor, but this is just beyond the pale.
> The checkpatch output is fairly useful and clearly Lennart did find the
> reminder useful, but you could have been less abrasive by putting in a
> little more work.
>
> Verdict: useful.

I usually go by the number of errors I find when I start saying "We" ..
I don't know Lennart at all, and there was no disrespect intended .. I
still think going by the number of errors I find is appropriate, however
I could stop using "We" in terms of putting people on the outside.

> Message-Id: <1253092251.11643.569.camel@desktop>
> davem already gave useful commentary 2 hours before your mail, so I
> wouldn't have bothered to send your mail. If you do send this, it would
> be more useful to say "You seem to use spaces for indentation
> everywhere, could you fix that and run checkpatch before re-submitting".
>
> Verdict: technically useful but socially abrasive.

I don't see anything abrasive in that email .. Infact I think your
re-write above is more abrasive than what I wrote .. I did however,
overlook Dave's earlier comments which isn't exactly unlikely to happen
given daily LKML traffic .. I'll try to be mindful of that in the
future.

>
> In summary, of 8 messages reviewed that's 3 completely useless messages,
> 2 of middling use, and 3 useful ones. Of that, two or three times you
> were strongly abrasive IMO.

I think maybe your overstating things a bit .. For instance, "strongly
abrasive" for someone that was not intending to be abrasive at all seems
like a stretch ..

The one repeating Dave's comments could have been avoided, and the first
one your brought up to Felipe maybe shouldn't have been sent at all, but
all did do some good since Felipe was happy to send additional cleanups.

> That's about the proportion I've anecdotally observed in your checkpatch
> mails (a bit more positive than I would have guessed actually). If
> you'd sent only three of those eight messages, I think you'd be getting
> much less negative feedback, and you'd have had 166% more time to spend
> on each message.

Having sent fewer emails certainly would draw less feedback positive or
negative.. I don't think the negative percentage would have changed tho,
since I feel that's the nature of the list.

> FWIW, I think you have a valuable contribution but it's being drowned
> out by your errors and non-useful messages. Please be more careful to
> not send erroneous messages, contribute patches to address (even just
> some of!) the errors you flag, graciously acknowledge your errors when
> you're called on them (and learn from the corrections!), and perhaps
> people will lay off on you.

I appreciate the comments, and it was helpful .. BTW, I've contributed
tons of pure checkpatch clean ups in the past .. So it's not like I'm
just asking for people to clean up stuff without having done these types
of clean ups myself.

Daniel

2009-09-24 16:08:10

by Claudio Scordino

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

Peter Zijlstra ha scritto:
> On Tue, 2009-09-22 at 13:58 +0200, Claudio Scordino wrote:
>>> This implementation is part of a FP7 European project called ACTORS,
>>> and financially supported by the European commission
>>> (see http://www.actors-project.eu).
>>>
>>> The involved partners (which include Ericsson Research, Evidence Srl,
>>> AKAtech) strongly believe that a general-purpose operating system like
>>> Linux should provide a standard real-time scheduling policy (like the
>>> one implemented in this patch) still allowing to schedule non-real-time
>>> tasks in the usual way.
>> Hi Peter, hi all,
>>
>> note that the request of having an EDF scheduling class integrated
>> in the mainline kernel does not come from academy, but comes from industry.
>
> Yeah, I know, there's lots of interest in having a deadline scheduler.
>
> One of the things we'll have to look at is removing all the EDF
> assumptions from it.
>
> I think we'll be wanting to do something like s/EDF/DEADLINE/
> s/edf/deadline/ etc. on the code base to begin with.

This is easy: we can give it whatever name you want :)

Are you suggesting to use "SCHED_DEADLINE" ?

Cheers,

Claudio

2009-09-27 07:14:44

by Henrik Austad

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tuesday 22. September 2009 20.36.43 Peter Zijlstra wrote:
> On Tue, 2009-09-22 at 14:51 +0200, Raistlin wrote:

>[...]
> > > > * In case a SCHED_EDF tasks forks, parent's budget is split among
> > > > parent and child.
> > >
> > > Right, I guess there's really nothing else you can do here...
> >
> > Well, me too... But during tests I run into a poor EDF shell that, after
> > each `ls' or `cat', lost half of its bandwidth up to be no longer
> > capable of running at all! :(
> >
> > We may avoid this having the son giving back its bandwidth to the father
> > when dieing (what a sad story! :( ) but this would need distinguishing
> > between fork-ed and setschedul-ed EDF tasks. Moreover, e.g., what if the
> > son changed its EDF bandwidth in the meanwhile? Or worse if it changed
> > its scheduling policy?
> >
> > At the current time, I'm just splitting the bandwidth, and nothing more.
> > Actually, I also think the solution is the right one, but I would really
> > like to discuss the issues it raises.
>
> Ooh, good point,.. yes we can put some exit hooks in there folding the
> runtime back.
>
> An alternative is starting the child out with 0 runtime, and have the
> parent run sched_setscheduler() on it giving us a clear point to run
> admission on.

Why not start it as sched_fair/sched_rt and let the child apply for
resources the same way the parent did? That would be fairly
straightforward and lead to predictable behaviour, and also make a nice,
simple hook into the acceptance-tests.

--
henrik


Attachments:
(No filename) (1.50 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2009-09-29 16:10:35

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Sun, 2009-09-27 at 08:55 +0200, Henrik Austad wrote:
> > An alternative is starting the child out with 0 runtime, and have the
> > parent run sched_setscheduler() on it giving us a clear point to run
> > admission on.
>
> Why not start it as sched_fair/sched_rt and let the child apply for
> resources the same way the parent did? That would be fairly
> straightforward and lead to predictable behaviour, and also make a nice,
> simple hook into the acceptance-tests.
>
Yeah, that's an option as well... It maybe overlap a little bit with
reset_on_fork, but I like tha fact that it allows the task itself to ask
for EDF bandwidth without having to rely on its parent... Thoughts about
that?

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-29 17:36:27

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On 09/29/2009 10:10 AM, Raistlin wrote:
> On Sun, 2009-09-27 at 08:55 +0200, Henrik Austad wrote:
>>> An alternative is starting the child out with 0 runtime, and have the
>>> parent run sched_setscheduler() on it giving us a clear point to run
>>> admission on.
>>
>> Why not start it as sched_fair/sched_rt and let the child apply for
>> resources the same way the parent did? That would be fairly
>> straightforward and lead to predictable behaviour, and also make a nice,
>> simple hook into the acceptance-tests.
>>
> Yeah, that's an option as well... It maybe overlap a little bit with
> reset_on_fork, but I like tha fact that it allows the task itself to ask
> for EDF bandwidth without having to rely on its parent... Thoughts about
> that?

Basically it boils down to a policy decision...if a task with guaranteed
cpu allocation fork()s, should the child automatically get a guaranteed
allocation or not? If so, should that allocation cause the parent's
allocation to be reduced or not?

Personally I don't like the idea of fork() resulting in permanently
reduced allocation for the parent. I think the logical choices are either

a) The child should get an identical bandwidth guarantee as the parent
and if that can't be guaranteed then the fork() should fail, maybe with
an errno of EBUSY.

b) The child should start out with no guarantees (SCHED_OTHER nice 0
maybe?) and should have to request a bandwidth guarantee. This could
complicate things in some circumstances because if it can't get the
guarantee then it needs to inform the parent somehow.

Given that any serious users of EDF are likely pretty specialized,
either one would probably work. Which is the best policy is a different
question, and one that I don't have enough experience with to offer an
opinion.

Chris

2009-09-29 18:15:36

by Roel Kluin

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

> ?static int __sched_setscheduler(struct task_struct *p, int policy,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct sched_param *param, bool user)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct sched_param *param,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct sched_param_ex *param_ex,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool user)
...

> @@ -6200,6 +6364,13 @@ recheck:
> ? ? ? ? ? ?(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
> ? ? ? ? ? ?(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
> ? ? ? ? ? ? ? ?return -EINVAL;
> + ? ? ? if (edf_policy(policy) && param_ex->sched_priority != 0)
> + ? ? ? ? ? ? ? return -EINVAL;
> + ? ? ? if (edf_policy(policy) && (param_ex == NULL ||
> + ? ? ? ? ? timespec_to_ns(&param_ex->sched_period) == 0 ||
> + ? ? ? ? ? timespec_to_ns(&param_ex->sched_period) <
> + ? ? ? ? ? timespec_to_ns(&param_ex->sched_runtime)))
> + ? ? ? ? ? ? ? return -EINVAL;

shouldn't the NULL test be moved upwards, to prevent a dereference of
a NULL pointer? Also I notice that `timespec_to_ns(&param_ex->sched_period)'
is called twice, maybe gcc does this but can't we do something like

if (edf_policy(policy)) {
if (param_ex == NULL || param_ex->sched_priority != 0)
return -EINVAL;
s64 psp = timespec_to_ns(&param_ex->sched_period);
if (psp == 0 || psp < timespec_to_ns(&param_ex->sched_runtime))
return -EINVAL;
}

Roel

2009-09-30 12:04:44

by Pavel Machek

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Tue 2009-09-22 17:51:43, Daniel Walker wrote:
> On Tue, 2009-09-22 at 21:11 +0200, Ingo Molnar wrote:
>
> > Now at patch patches/sched_edf-scheduling-class.patch
> >
> > Daniel is a kind of self-elected joker-bot on lkml, running checkpatch
> > out of .procmailrc and annoying people with trivial comments,
> > distracting from the real discussion. Raistlin, feel free to ignore him
> > in the future.

Good, I was afraid Ingo would actually like that behaviour.

> You want to talk about it constructively or not? I'll stop sending those
> emails if it's actually negative , but I don't think it is..

Getting you 'have misplaced parenthesis' feedback on a bugfix, on a
file thatt has 1001 checkpatch problems, and when the patch does not
actually introduce the problem... certainly is unwelcome.

What about simply avoiding the mails when the file being changed
already has checkpatch problems?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-09-30 12:06:11

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Wed 2009-09-23 07:50:59, Daniel Walker wrote:
> On Wed, 2009-09-23 at 14:25 +0200, Ingo Molnar wrote:
> > * Avi Kivity <[email protected]> wrote:
> >
> > >> discouraging contributions is more something that happens when you
> > >> get the responses I got earlier in this thread..
> > >
> > > That's probably intentional. Whitespace fixes have their place but
> > > not at this stage in a patch's lifecycle.
> >
> > Exactly. What might make sense is to scan linux-next for new commits
> > that show serious cleanliness trouble - and send fix patches to the
> > parties involved. That's a real effort and brings the code forward.
>
> Often times when a patch is at youngest that when you want to catch
> these issues .. This EDF patch will likely get submitted more than
> twice. If you catch all the minor problems first you will not be dealing
> with them later when it comes time to include the code.

You want to deal with them later, because many patches end up in
trashcan...

Now, Ingo's idea of scanning -next (and fixing it) sounds sane...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-09-30 12:06:42

by Pavel Machek

[permalink] [raw]
Subject: Re: checkpatch as a tool (was Re: [RFC][PATCH] SCHED_EDF scheduling class)

On Tue 2009-09-22 18:11:39, Daniel Walker wrote:
> On Tue, 2009-09-22 at 18:01 -0700, Joe Perches wrote:
> > On Tue, 2009-09-22 at 17:51 -0700, Daniel Walker wrote:
> > > I'll stop sending those
> > > emails if it's actually negative , but I don't think it is..
> >
> > Hi Daniel.
> >
> > I think it'd be more helpful if instead of merely sending
> > a "hah! checkpatch failure!" email you send the submitter
> > a gentler nudge and a fixed patch.
>
> I don't say "hah!" like I caught you or something .. I could try to
> soften the emails further so that's not the feeling .. As far as sending
> fixed patches to people it's not really feasible due to the number of
> patches I would have to send out.. The other issues is that if I send a

Well, so pick the worst offenders, only. That will also nicely reduce
ammount of distraction on the list.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-09-30 15:58:25

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-29 at 11:34 -0600, Chris Friesen wrote:
> Basically it boils down to a policy decision...
Yep. I know that, and I know that policies have to be avoided as much as
possible in this lands... :-(

> Personally I don't like the idea of fork() resulting in permanently
> reduced allocation for the parent.
Yeah, me neither.

> a) The child should get an identical bandwidth guarantee as the parent
> and if that can't be guaranteed then the fork() should fail, maybe with
> an errno of EBUSY.
>
Again, this could be done, pretty easily actually. :-)

> b) The child should start out with no guarantees (SCHED_OTHER nice 0
> maybe?) and should have to request a bandwidth guarantee. This could
> complicate things in some circumstances because if it can't get the
> guarantee then it needs to inform the parent somehow.
>
Ok, I see and agree, again, to many extents.

> Given that any serious users of EDF are likely pretty specialized,
> either one would probably work. Which is the best policy is a different
> question, and one that I don't have enough experience with to offer an
> opinion.
>
Yeah... Maybe, since I'm adding (in the next patch I'm going to send
soon) a flag field in the sched_param_ex structure, we can also use some
of the bits for deciding how the fork will behave... The main problem
would be the code will get more complicated, and we thus would have to
decide if it is worth...

Again, thanks for finding some time to comment.

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-30 15:58:57

by Dario Faggioli

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On Tue, 2009-09-29 at 20:15 +0200, roel kluin wrote:
> shouldn't the NULL test be moved upwards, to prevent a dereference of
> a NULL pointer?
I definitely think you _do_ are right, many thanks! :-P

> Also I notice that `timespec_to_ns(&param_ex->sched_period)'
> is called twice, maybe gcc does this but can't we do something like
>
> if (edf_policy(policy)) {
> if (param_ex == NULL || param_ex->sched_priority != 0)
> return -EINVAL;
> s64 psp = timespec_to_ns(&param_ex->sched_period);
> if (psp == 0 || psp < timespec_to_ns(&param_ex->sched_runtime))
> return -EINVAL;
> }
>
Well, don't know... I guess the compiler do _something_ (also since
timespec_to_ns is 'static inline') but, to be sincere, I've not looked
at the assembly yet... But I may check it out, at least for x86, and
then consider this. :)

Thanks very much for your comments and suggestions,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / [email protected] /
[email protected]


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2009-09-30 17:37:12

by Chris Friesen

[permalink] [raw]
Subject: Re: [RFC][PATCH] SCHED_EDF scheduling class

On 09/30/2009 09:58 AM, Raistlin wrote:
> On Tue, 2009-09-29 at 11:34 -0600, Chris Friesen wrote:

>> a) The child should get an identical bandwidth guarantee as the parent
>> and if that can't be guaranteed then the fork() should fail, maybe with
>> an errno of EBUSY.
>>
> Again, this could be done, pretty easily actually. :-)
>
>> b) The child should start out with no guarantees (SCHED_OTHER nice 0
>> maybe?) and should have to request a bandwidth guarantee. This could
>> complicate things in some circumstances because if it can't get the
>> guarantee then it needs to inform the parent somehow.
>>
> Ok, I see and agree, again, to many extents.

> Maybe, since I'm adding (in the next patch I'm going to send
> soon) a flag field in the sched_param_ex structure, we can also use some
> of the bits for deciding how the fork will behave... The main problem
> would be the code will get more complicated, and we thus would have to
> decide if it is worth...

For now it might be best to keep it simple...it can always be extended
later on. Personally I prefer option "a" above as it makes applications
easier to code.

The only problem that I see is that it will refuse to fork() a task that
has a bandwidth of more than 50% of the system. I wouldn't expect this
to be a common occurrence, but I could be wrong.

Chris