2020-08-07 09:54:15

by Juri Lelli

[permalink] [raw]
Subject: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

Hi,

This is RFC v2 of Peter's SCHED_DEADLINE server infrastructure
implementation [1].

SCHED_DEADLINE servers can help fixing starvation issues of low priority
tasks (e.g., SCHED_OTHER) when higher priority tasks monopolize CPU
cycles. Today we have RT Throttling; DEADLINE servers should be able to
replace and improve that.

I rebased Peter's patches (adding changelogs where needed) on
tip/sched/core as of today and incorporated fixes to issues discussed
during RFC v1. Current set seems to even boot on real HW! :-)

While playing with RFC v1 set (and discussing it further offline with
Daniel) it has emerged the need to slightly change the behavior. Patch
6/6 is a (cumbersome?) attempt to show what's probably needed.
The problem with "original" implementation is that FIFO tasks might
suffer preemption from NORMAL even when spare CPU cycles are available.
In fact, fair deadline server is enqueued right away when NORMAL tasks
wake up and they are first scheduled by the server, thus potentially
preempting a well behaving FIFO task. This is of course not ideal.
So, in patch 6/6 I propose to use some kind of starvation monitor/
watchdog that delays enqueuing of deadline servers to the point when
fair tasks might start to actually suffer from starvation (just randomly
picked HZ/2 for now). One problem I already see with the current
implementation is that it adds overhead to fair paths, so I'm pretty
sure there are better ways to implement the idea (e.g., Daniel already
suggested using a starvation monitor kthread sort of thing).

Receiving comments and suggestions is the sole purpose of this posting
at this stage. Hopefully we can further discuss the idea at Plumbers in
a few weeks. So, please don't focus too much into actual implementation
(which I plan to revise anyway after I'm back from pto :), but try to
see if this might actually fly. The feature seems to be very much needed.

Thanks!

Juri

1 - https://lore.kernel.org/lkml/[email protected]/

Juri Lelli (1):
sched/fair: Implement starvation monitor

Peter Zijlstra (5):
sched: Unify runtime accounting across classes
sched/deadline: Collect sched_dl_entity initialization
sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity
sched/deadline: Introduce deadline servers
sched/fair: Add trivial fair server

include/linux/sched.h | 28 ++-
kernel/sched/core.c | 23 +-
kernel/sched/deadline.c | 483 ++++++++++++++++++++++++---------------
kernel/sched/fair.c | 136 ++++++++++-
kernel/sched/rt.c | 17 +-
kernel/sched/sched.h | 50 +++-
kernel/sched/stop_task.c | 16 +-
7 files changed, 522 insertions(+), 231 deletions(-)

--
2.26.2


2020-08-07 09:54:50

by Juri Lelli

[permalink] [raw]
Subject: [RFC PATCH v2 5/6] sched/fair: Add trivial fair server

From: Peter Zijlstra <[email protected]>

Use deadline servers to service fair tasks.

This patch adds a fair_server deadline entity which acts as a container
for fair entities and can be used to fix starvation when higher priority
(wrt fair) tasks are monopolizing CPU(s).

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 29 +++++++++++++++++++++++++++++
kernel/sched/sched.h | 4 ++++
3 files changed, 34 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7c471961fd0b8..6537637139c63 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7170,6 +7170,7 @@ void __init sched_init(void)
#endif /* CONFIG_SMP */
hrtick_rq_init(rq);
atomic_set(&rq->nr_iowait, 0);
+ fair_server_init(rq);
}

set_load_weight(&init_task, false);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5130239c0e1e5..6a97ee2a4e26d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5514,6 +5514,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
*/
util_est_enqueue(&rq->cfs, p);

+ if (!rq->cfs.h_nr_running)
+ dl_server_start(&rq->fair_server);
+
/*
* If in_iowait is set, the code below may not trigger any cpufreq
* utilization updates, so do it here explicitly with the IOWAIT flag
@@ -5666,6 +5669,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
rq->next_balance = jiffies;

dequeue_throttle:
+ if (!rq->cfs.h_nr_running)
+ dl_server_stop(&rq->fair_server);
+
util_est_dequeue(&rq->cfs, p, task_sleep);
hrtick_update(rq);
}
@@ -7151,6 +7157,29 @@ static struct task_struct *__pick_next_task_fair(struct rq *rq)
return pick_next_task_fair(rq, NULL, NULL);
}

+static bool fair_server_has_tasks(struct sched_dl_entity *dl_se)
+{
+ return !!dl_se->rq->cfs.nr_running;
+}
+
+static struct task_struct *fair_server_pick(struct sched_dl_entity *dl_se)
+{
+ return pick_next_task_fair(dl_se->rq, NULL, NULL);
+}
+
+void fair_server_init(struct rq *rq)
+{
+ struct sched_dl_entity *dl_se = &rq->fair_server;
+
+ init_dl_entity(dl_se);
+
+ dl_se->dl_runtime = TICK_NSEC;
+ dl_se->dl_deadline = 20 * TICK_NSEC;
+ dl_se->dl_period = 20 * TICK_NSEC;
+
+ dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick);
+}
+
/*
* Account for a descheduled task:
*/
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f035cd8ccd224..bf8c9c07705c9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -375,6 +375,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick);

+extern void fair_server_init(struct rq *);
+
#ifdef CONFIG_CGROUP_SCHED

#include <linux/cgroup.h>
@@ -959,6 +961,8 @@ struct rq {
struct rt_rq rt;
struct dl_rq dl;

+ struct sched_dl_entity fair_server;
+
#ifdef CONFIG_FAIR_GROUP_SCHED
/* list of leaf cfs_rq on this CPU: */
struct list_head leaf_cfs_rq_list;
--
2.26.2

2020-08-07 09:55:25

by Juri Lelli

[permalink] [raw]
Subject: [RFC PATCH v2 3/6] sched/deadline: Move bandwidth accounting into {en,de}queue_dl_entity

From: Peter Zijlstra <[email protected]>

In preparation of introducing !task sched_dl_entity; move the
bandwidth accounting into {en.de}queue_dl_entity().

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/deadline.c | 128 ++++++++++++++++++++++------------------
kernel/sched/sched.h | 6 ++
2 files changed, 77 insertions(+), 57 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 8d909bdb9a119..d4007d1461522 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -275,12 +275,12 @@ static void __dl_clear_params(struct sched_dl_entity *dl_se);
* up, and checks if the task is still in the "ACTIVE non contending"
* state or not (in the second case, it updates running_bw).
*/
-static void task_non_contending(struct task_struct *p)
+static void task_non_contending(struct sched_dl_entity *dl_se)
{
- struct sched_dl_entity *dl_se = &p->dl;
struct hrtimer *timer = &dl_se->inactive_timer;
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
struct rq *rq = rq_of_dl_rq(dl_rq);
+ struct task_struct *p = dl_task_of(dl_se);
s64 zerolag_time;

/*
@@ -312,13 +312,14 @@ static void task_non_contending(struct task_struct *p)
if ((zerolag_time < 0) || hrtimer_active(&dl_se->inactive_timer)) {
if (dl_task(p))
sub_running_bw(dl_se, dl_rq);
+
if (!dl_task(p) || p->state == TASK_DEAD) {
struct dl_bw *dl_b = dl_bw_of(task_cpu(p));

if (p->state == TASK_DEAD)
- sub_rq_bw(&p->dl, &rq->dl);
+ sub_rq_bw(dl_se, &rq->dl);
raw_spin_lock(&dl_b->lock);
- __dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p)));
+ __dl_sub(dl_b, dl_se->dl_bw, dl_bw_cpus(task_cpu(p)));
__dl_clear_params(dl_se);
raw_spin_unlock(&dl_b->lock);
}
@@ -1477,6 +1478,41 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,
{
BUG_ON(on_dl_rq(dl_se));

+ /*
+ * Check if a constrained deadline task was activated
+ * after the deadline but before the next period.
+ * If that is the case, the task will be throttled and
+ * the replenishment timer will be set to the next period.
+ */
+ if (!dl_se->dl_throttled && !dl_is_implicit(dl_se))
+ dl_check_constrained_dl(dl_se);
+
+ if (flags & (ENQUEUE_RESTORE|ENQUEUE_MIGRATING)) {
+ struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+
+ add_rq_bw(dl_se, dl_rq);
+ add_running_bw(dl_se, dl_rq);
+ }
+
+ /*
+ * If p is throttled, we do not enqueue it. In fact, if it exhausted
+ * its budget it needs a replenishment and, since it now is on
+ * its rq, the bandwidth timer callback (which clearly has not
+ * run yet) will take care of this.
+ * However, the active utilization does not depend on the fact
+ * that the task is on the runqueue or not (but depends on the
+ * task's state - in GRUB parlance, "inactive" vs "active contending").
+ * In other words, even if a task is throttled its utilization must
+ * be counted in the active utilization; hence, we need to call
+ * add_running_bw().
+ */
+ if (dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
+ if (flags & ENQUEUE_WAKEUP)
+ task_contending(dl_se, flags);
+
+ return;
+ }
+
/*
* If this is a wakeup or a new instance, the scheduling
* parameters of the task might need updating. Otherwise,
@@ -1496,9 +1532,28 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,
__enqueue_dl_entity(dl_se);
}

-static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
+static void dequeue_dl_entity(struct sched_dl_entity *dl_se, int flags)
{
__dequeue_dl_entity(dl_se);
+
+ if (flags & (DEQUEUE_SAVE|DEQUEUE_MIGRATING)) {
+ struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
+
+ sub_running_bw(dl_se, dl_rq);
+ sub_rq_bw(dl_se, dl_rq);
+ }
+
+ /*
+ * This check allows to start the inactive timer (or to immediately
+ * decrease the active utilization, if needed) in two cases:
+ * when the task blocks and when it is terminating
+ * (p->state == TASK_DEAD). We can handle the two cases in the same
+ * way, because from GRUB's point of view the same thing is happening
+ * (the task moves from "active contending" to "active non contending"
+ * or "inactive")
+ */
+ if (flags & DEQUEUE_SLEEP)
+ task_non_contending(dl_se);
}

static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
@@ -1528,72 +1583,31 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
return;
}

- /*
- * Check if a constrained deadline task was activated
- * after the deadline but before the next period.
- * If that is the case, the task will be throttled and
- * the replenishment timer will be set to the next period.
- */
- if (!p->dl.dl_throttled && !dl_is_implicit(&p->dl))
- dl_check_constrained_dl(&p->dl);
-
- if (p->on_rq == TASK_ON_RQ_MIGRATING || flags & ENQUEUE_RESTORE) {
- add_rq_bw(&p->dl, &rq->dl);
- add_running_bw(&p->dl, &rq->dl);
- }
-
- /*
- * If p is throttled, we do not enqueue it. In fact, if it exhausted
- * its budget it needs a replenishment and, since it now is on
- * its rq, the bandwidth timer callback (which clearly has not
- * run yet) will take care of this.
- * However, the active utilization does not depend on the fact
- * that the task is on the runqueue or not (but depends on the
- * task's state - in GRUB parlance, "inactive" vs "active contending").
- * In other words, even if a task is throttled its utilization must
- * be counted in the active utilization; hence, we need to call
- * add_running_bw().
- */
- if (p->dl.dl_throttled && !(flags & ENQUEUE_REPLENISH)) {
- if (flags & ENQUEUE_WAKEUP)
- task_contending(&p->dl, flags);
-
- return;
- }
+ if (p->on_rq == TASK_ON_RQ_MIGRATING)
+ flags |= ENQUEUE_MIGRATING;

enqueue_dl_entity(&p->dl, pi_se, flags);

- if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
+ if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1)
enqueue_pushable_dl_task(rq, p);
}

static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
{
- dequeue_dl_entity(&p->dl);
- dequeue_pushable_dl_task(rq, p);
+ dequeue_dl_entity(&p->dl, flags);
+
+ if (!p->dl.dl_throttled)
+ dequeue_pushable_dl_task(rq, p);
}

static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
{
update_curr_dl(rq);
- __dequeue_task_dl(rq, p, flags);

- if (p->on_rq == TASK_ON_RQ_MIGRATING || flags & DEQUEUE_SAVE) {
- sub_running_bw(&p->dl, &rq->dl);
- sub_rq_bw(&p->dl, &rq->dl);
- }
+ if (p->on_rq == TASK_ON_RQ_MIGRATING)
+ flags |= DEQUEUE_MIGRATING;

- /*
- * This check allows to start the inactive timer (or to immediately
- * decrease the active utilization, if needed) in two cases:
- * when the task blocks and when it is terminating
- * (p->state == TASK_DEAD). We can handle the two cases in the same
- * way, because from GRUB's point of view the same thing is happening
- * (the task moves from "active contending" to "active non contending"
- * or "inactive")
- */
- if (flags & DEQUEUE_SLEEP)
- task_non_contending(p);
+ __dequeue_task_dl(rq, p, flags);
}

/*
@@ -2373,7 +2387,7 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p)
* will reset the task parameters.
*/
if (task_on_rq_queued(p) && p->dl.dl_runtime)
- task_non_contending(p);
+ task_non_contending(&p->dl);

if (!task_on_rq_queued(p)) {
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 62304d4de99cc..d3db8c7d8b641 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1741,6 +1741,10 @@ extern const u32 sched_prio_to_wmult[40];
* MOVE - paired with SAVE/RESTORE, explicitly does not preserve the location
* in the runqueue.
*
+ * NOCLOCK - skip the update_rq_clock() (avoids double updates)
+ *
+ * MIGRATION - p->on_rq == TASK_ON_RQ_MIGRATING (used for DEADLINE)
+ *
* ENQUEUE_HEAD - place at front of runqueue (tail if not specified)
* ENQUEUE_REPLENISH - CBS (replenish runtime and postpone deadline)
* ENQUEUE_MIGRATED - the task was migrated during wakeup
@@ -1751,6 +1755,7 @@ extern const u32 sched_prio_to_wmult[40];
#define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */
#define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */
#define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */
+#define DEQUEUE_MIGRATING 0x80 /* Matches ENQUEUE_MIGRATING */

#define ENQUEUE_WAKEUP 0x01
#define ENQUEUE_RESTORE 0x02
@@ -1764,6 +1769,7 @@ extern const u32 sched_prio_to_wmult[40];
#else
#define ENQUEUE_MIGRATED 0x00
#endif
+#define ENQUEUE_MIGRATING 0x80

#define RETRY_TASK ((void *)-1UL)

--
2.26.2

2020-08-07 09:57:09

by Juri Lelli

[permalink] [raw]
Subject: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

Starting deadline server for lower priority classes right away when
first task is enqueued might break guarantees, as tasks belonging to
intermediate priority classes could be uselessly preempted. E.g., a well
behaving (non hog) FIFO task can be preempted by NORMAL tasks even if
there are still CPU cycles available for NORMAL tasks to run, as they'll
be running inside the fair deadline server for some period of time.

To prevent this issue, implement a starvation monitor mechanism that
starts the deadline server only if a (fair in this case) task hasn't
been scheduled for some interval of time after it has been enqueued.
Use pick/put functions to manage starvation monitor status.

Signed-off-by: Juri Lelli <[email protected]>
---
kernel/sched/fair.c | 57 ++++++++++++++++++++++++++++++++++++++++++--
kernel/sched/sched.h | 4 ++++
2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a97ee2a4e26d..5cdf76e508074 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5494,6 +5494,53 @@ static int sched_idle_cpu(int cpu)
}
#endif

+
+static void fair_server_watchdog(struct timer_list *list)
+{
+ struct rq *rq = container_of(list, struct rq, fair_server_wd);
+ struct rq_flags rf;
+
+ rq_lock_irqsave(rq, &rf);
+ rq->fair_server_wd_running = 0;
+
+ if (!rq->cfs.h_nr_running)
+ goto out;
+
+ update_rq_clock(rq);
+ dl_server_start(&rq->fair_server);
+ rq->fair_server_active = 1;
+ resched_curr(rq);
+
+out:
+ rq_unlock_irqrestore(rq, &rf);
+}
+
+static inline void fair_server_watchdog_start(struct rq *rq)
+{
+ if (rq->fair_server_wd_running || rq->fair_server_active)
+ return;
+
+ timer_setup(&rq->fair_server_wd, fair_server_watchdog, 0);
+ rq->fair_server_wd.expires = jiffies + FAIR_SERVER_WATCHDOG_INTERVAL;
+ add_timer_on(&rq->fair_server_wd, cpu_of(rq));
+ rq->fair_server_active = 0;
+ rq->fair_server_wd_running = 1;
+}
+
+static inline void fair_server_watchdog_stop(struct rq *rq, bool stop_server)
+{
+ if (!rq->fair_server_wd_running && !stop_server)
+ return;
+
+ del_timer(&rq->fair_server_wd);
+ rq->fair_server_wd_running = 0;
+
+ if (stop_server && rq->fair_server_active) {
+ dl_server_stop(&rq->fair_server);
+ rq->fair_server_active = 0;
+ }
+}
+
/*
* The enqueue_task method is called before nr_running is
* increased. Here we update the fair scheduling stats and
@@ -5515,7 +5562,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
util_est_enqueue(&rq->cfs, p);

if (!rq->cfs.h_nr_running)
- dl_server_start(&rq->fair_server);
+ fair_server_watchdog_start(rq);

/*
* If in_iowait is set, the code below may not trigger any cpufreq
@@ -5670,7 +5717,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)

dequeue_throttle:
if (!rq->cfs.h_nr_running)
- dl_server_stop(&rq->fair_server);
+ fair_server_watchdog_stop(rq, true);

util_est_dequeue(&rq->cfs, p, task_sleep);
hrtick_update(rq);
@@ -7123,6 +7170,7 @@ done: __maybe_unused;
hrtick_start_fair(rq, p);

update_misfit_status(p, rq);
+ fair_server_watchdog_stop(rq, false);

return p;

@@ -7178,6 +7226,8 @@ void fair_server_init(struct rq *rq)
dl_se->dl_period = 20 * TICK_NSEC;

dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick);
+
+ rq->fair_server_wd_running = 0;
}

/*
@@ -7192,6 +7242,9 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev)
cfs_rq = cfs_rq_of(se);
put_prev_entity(cfs_rq, se);
}
+
+ if (rq->cfs.h_nr_running)
+ fair_server_watchdog_start(rq);
}

/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf8c9c07705c9..1e1a5436be725 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -375,6 +375,7 @@ extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick);

+#define FAIR_SERVER_WATCHDOG_INTERVAL (HZ >> 1)
extern void fair_server_init(struct rq *);

#ifdef CONFIG_CGROUP_SCHED
@@ -962,6 +963,9 @@ struct rq {
struct dl_rq dl;

struct sched_dl_entity fair_server;
+ int fair_server_active;
+ struct timer_list fair_server_wd;
+ int fair_server_wd_running;

#ifdef CONFIG_FAIR_GROUP_SCHED
/* list of leaf cfs_rq on this CPU: */
--
2.26.2

2020-08-07 10:49:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On Fri, Aug 07, 2020 at 11:56:04AM +0200, Juri Lelli wrote:
> Starting deadline server for lower priority classes right away when
> first task is enqueued might break guarantees, as tasks belonging to
> intermediate priority classes could be uselessly preempted. E.g., a well
> behaving (non hog) FIFO task can be preempted by NORMAL tasks even if
> there are still CPU cycles available for NORMAL tasks to run, as they'll
> be running inside the fair deadline server for some period of time.
>
> To prevent this issue, implement a starvation monitor mechanism that
> starts the deadline server only if a (fair in this case) task hasn't
> been scheduled for some interval of time after it has been enqueued.
> Use pick/put functions to manage starvation monitor status.

One thing I considerd was scheduling this as a least-laxity entity --
such that it runs late, not early -- and start the server when
rq->nr_running != rq->cfs.h_nr_running, IOW when there's !fair tasks
around.

Not saying we should do it like that, but that's perhaps more
deterministic than this.

Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On 8/7/20 12:46 PM, [email protected] wrote:
> On Fri, Aug 07, 2020 at 11:56:04AM +0200, Juri Lelli wrote:
>> Starting deadline server for lower priority classes right away when
>> first task is enqueued might break guarantees, as tasks belonging to
>> intermediate priority classes could be uselessly preempted. E.g., a well
>> behaving (non hog) FIFO task can be preempted by NORMAL tasks even if
>> there are still CPU cycles available for NORMAL tasks to run, as they'll
>> be running inside the fair deadline server for some period of time.
>>
>> To prevent this issue, implement a starvation monitor mechanism that
>> starts the deadline server only if a (fair in this case) task hasn't
>> been scheduled for some interval of time after it has been enqueued.
>> Use pick/put functions to manage starvation monitor status.
> One thing I considerd was scheduling this as a least-laxity entity --
> such that it runs late, not early -- and start the server when
> rq->nr_running != rq->cfs.h_nr_running, IOW when there's !fair tasks
> around.
>
> Not saying we should do it like that, but that's perhaps more
> deterministic than this.
>

I agree, what we want here is something that schedules the server if it still
retains some runtime when the laxity is 0. But this is easier said than done, as
this would require another scheduler (other pros and cons and analysis (and
hours of work)...).

But, for the starvation monitor purpose, the goal is not (necessarily) to
provide a deterministic guarantee for the starving task, but to avoid system
issues while minimizing the damage to the "real" real-time workload. With that
in mind, we could relax our ambitions...

Thoughts?

-- Daniel

2020-08-07 12:53:37

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On 07/08/20 13:30, Daniel Bristot de Oliveira wrote:
> On 8/7/20 12:46 PM, [email protected] wrote:
> > On Fri, Aug 07, 2020 at 11:56:04AM +0200, Juri Lelli wrote:
> >> Starting deadline server for lower priority classes right away when
> >> first task is enqueued might break guarantees, as tasks belonging to
> >> intermediate priority classes could be uselessly preempted. E.g., a well
> >> behaving (non hog) FIFO task can be preempted by NORMAL tasks even if
> >> there are still CPU cycles available for NORMAL tasks to run, as they'll
> >> be running inside the fair deadline server for some period of time.
> >>
> >> To prevent this issue, implement a starvation monitor mechanism that
> >> starts the deadline server only if a (fair in this case) task hasn't
> >> been scheduled for some interval of time after it has been enqueued.
> >> Use pick/put functions to manage starvation monitor status.
> > One thing I considerd was scheduling this as a least-laxity entity --
> > such that it runs late, not early -- and start the server when
> > rq->nr_running != rq->cfs.h_nr_running, IOW when there's !fair tasks
> > around.

IIUC, this would still require programming a timer to fire when laxity
is 0, but doing that only when there are !fair tasks around (so when
enqueuing the first !fair or if there are !fair already when first fair
is enqueued) would probably save us some overhead, I agree (as no timer
and no enqueue of deadline server would be needed in the common "only
fair" case).

> >
> > Not saying we should do it like that, but that's perhaps more
> > deterministic than this.
> >
>
> I agree, what we want here is something that schedules the server if it still
> retains some runtime when the laxity is 0. But this is easier said than done, as
> this would require another scheduler (other pros and cons and analysis (and
> hours of work)...).
>
> But, for the starvation monitor purpose, the goal is not (necessarily) to
> provide a deterministic guarantee for the starving task, but to avoid system
> issues while minimizing the damage to the "real" real-time workload. With that
> in mind, we could relax our ambitions...
>
> Thoughts?

I agree that we don't probably want to develop an additional scheduler/
policy for this, but I'll think a bit more about Peter's idea. Maybe
it's already a viable optimization w/o changing EDF/CBS.

2020-08-07 13:30:12

by luca abeni

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

Hi Juri,

On Fri, 7 Aug 2020 11:56:04 +0200
Juri Lelli <[email protected]> wrote:

> Starting deadline server for lower priority classes right away when
> first task is enqueued might break guarantees

Which guarantees are you thinking about, here? Response times of fixed
priority tasks?

If fixed priority tasks are also scheduled through deadline servers,
then you can provide response-time guarantees to them even when
lower-priority/non-real-time tasks are scheduled through deadline
servers.


Thanks,
Luca

> as tasks belonging to
> intermediate priority classes could be uselessly preempted. E.g., a
> well behaving (non hog) FIFO task can be preempted by NORMAL tasks
> even if there are still CPU cycles available for NORMAL tasks to run,
> as they'll be running inside the fair deadline server for some period
> of time.
>
> To prevent this issue, implement a starvation monitor mechanism that
> starts the deadline server only if a (fair in this case) task hasn't
> been scheduled for some interval of time after it has been enqueued.
> Use pick/put functions to manage starvation monitor status.
>
> Signed-off-by: Juri Lelli <[email protected]>
> ---
> kernel/sched/fair.c | 57
> ++++++++++++++++++++++++++++++++++++++++++-- kernel/sched/sched.h |
> 4 ++++ 2 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6a97ee2a4e26d..5cdf76e508074 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5494,6 +5494,53 @@ static int sched_idle_cpu(int cpu)
> }
> #endif
>
> +
> +static void fair_server_watchdog(struct timer_list *list)
> +{
> + struct rq *rq = container_of(list, struct rq,
> fair_server_wd);
> + struct rq_flags rf;
> +
> + rq_lock_irqsave(rq, &rf);
> + rq->fair_server_wd_running = 0;
> +
> + if (!rq->cfs.h_nr_running)
> + goto out;
> +
> + update_rq_clock(rq);
> + dl_server_start(&rq->fair_server);
> + rq->fair_server_active = 1;
> + resched_curr(rq);
> +
> +out:
> + rq_unlock_irqrestore(rq, &rf);
> +}
> +
> +static inline void fair_server_watchdog_start(struct rq *rq)
> +{
> + if (rq->fair_server_wd_running || rq->fair_server_active)
> + return;
> +
> + timer_setup(&rq->fair_server_wd, fair_server_watchdog, 0);
> + rq->fair_server_wd.expires = jiffies +
> FAIR_SERVER_WATCHDOG_INTERVAL;
> + add_timer_on(&rq->fair_server_wd, cpu_of(rq));
> + rq->fair_server_active = 0;
> + rq->fair_server_wd_running = 1;
> +}
> +
> +static inline void fair_server_watchdog_stop(struct rq *rq, bool
> stop_server) +{
> + if (!rq->fair_server_wd_running && !stop_server)
> + return;
> +
> + del_timer(&rq->fair_server_wd);
> + rq->fair_server_wd_running = 0;
> +
> + if (stop_server && rq->fair_server_active) {
> + dl_server_stop(&rq->fair_server);
> + rq->fair_server_active = 0;
> + }
> +}
> +
> /*
> * The enqueue_task method is called before nr_running is
> * increased. Here we update the fair scheduling stats and
> @@ -5515,7 +5562,7 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags) util_est_enqueue(&rq->cfs, p);
>
> if (!rq->cfs.h_nr_running)
> - dl_server_start(&rq->fair_server);
> + fair_server_watchdog_start(rq);
>
> /*
> * If in_iowait is set, the code below may not trigger any
> cpufreq @@ -5670,7 +5717,7 @@ static void dequeue_task_fair(struct rq
> *rq, struct task_struct *p, int flags)
> dequeue_throttle:
> if (!rq->cfs.h_nr_running)
> - dl_server_stop(&rq->fair_server);
> + fair_server_watchdog_stop(rq, true);
>
> util_est_dequeue(&rq->cfs, p, task_sleep);
> hrtick_update(rq);
> @@ -7123,6 +7170,7 @@ done: __maybe_unused;
> hrtick_start_fair(rq, p);
>
> update_misfit_status(p, rq);
> + fair_server_watchdog_stop(rq, false);
>
> return p;
>
> @@ -7178,6 +7226,8 @@ void fair_server_init(struct rq *rq)
> dl_se->dl_period = 20 * TICK_NSEC;
>
> dl_server_init(dl_se, rq, fair_server_has_tasks,
> fair_server_pick); +
> + rq->fair_server_wd_running = 0;
> }
>
> /*
> @@ -7192,6 +7242,9 @@ static void put_prev_task_fair(struct rq *rq,
> struct task_struct *prev) cfs_rq = cfs_rq_of(se);
> put_prev_entity(cfs_rq, se);
> }
> +
> + if (rq->cfs.h_nr_running)
> + fair_server_watchdog_start(rq);
> }
>
> /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index bf8c9c07705c9..1e1a5436be725 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -375,6 +375,7 @@ extern void dl_server_init(struct sched_dl_entity
> *dl_se, struct rq *rq, dl_server_has_tasks_f has_tasks,
> dl_server_pick_f pick);
>
> +#define FAIR_SERVER_WATCHDOG_INTERVAL (HZ >> 1)
> extern void fair_server_init(struct rq *);
>
> #ifdef CONFIG_CGROUP_SCHED
> @@ -962,6 +963,9 @@ struct rq {
> struct dl_rq dl;
>
> struct sched_dl_entity fair_server;
> + int fair_server_active;
> + struct timer_list fair_server_wd;
> + int fair_server_wd_running;
>
> #ifdef CONFIG_FAIR_GROUP_SCHED
> /* list of leaf cfs_rq on this CPU: */

2020-08-07 13:33:47

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

Hi Luca,

On 07/08/20 15:16, luca abeni wrote:
> Hi Juri,
>
> thanks for sharing the v2 patchset!
>
> In the next days I'll have a look at it, and try some tests...

Thanks!

> In the meanwhile, I have some questions/comments after a first quick
> look.
>
> If I understand well, the patchset does not apply deadline servers to
> FIFO and RR tasks, right? How does this patchset interact with RT
> throttling?

Well, it's something like the dual of it, in that RT Throttling directly
throttles RT tasks to make spare CPU cycles available to fair tasks
while this patchset introduces deadline servers to schedule fair tasks,
thus still reserving CPU time for those (when needed).

> If I understand well, patch 6/6 does something like "use deadline
> servers for SCHED_OTHER only if FIFO/RR tasks risk to starve
> SCHED_OTHER tasks"... Right?

That's the basic idea, yes.

> I understand this is because you do not
> want to delay RT tasks if they are not starving other tasks. But then,
> maybe what you want is not deadline-based scheduling. Maybe a
> reservation-based scheduler based on fixed priorities is what you want?
> (with SCHED_DEADLINE, you could provide exact performance guarantees to
> SCHED_OTHER tasks, but I suspect patch 6/6 breaks these guarantees?)

I agree that we are not interested in exact guarantees in this case, but
why not using something that it's already there and would give us the
functionality we need (fix starvation for fair)? It would also work well
in presence of "real" deadline tasks I think, in that you could account
for these fair servers while performing admission control.

Best,

Juri

2020-08-07 13:45:45

by luca abeni

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

Hi Juri,

On Fri, 7 Aug 2020 15:30:41 +0200
Juri Lelli <[email protected]> wrote:
[...]
> > In the meanwhile, I have some questions/comments after a first quick
> > look.
> >
> > If I understand well, the patchset does not apply deadline servers
> > to FIFO and RR tasks, right? How does this patchset interact with RT
> > throttling?
>
> Well, it's something like the dual of it, in that RT Throttling
> directly throttles RT tasks to make spare CPU cycles available to
> fair tasks while this patchset introduces deadline servers to
> schedule fair tasks, thus still reserving CPU time for those (when
> needed).

Ah, OK... I was thinking about using deadline servers for both RT and
non-RT tasks. And to use them not only to throttle, but also to provide
some kind of performance guarantees (to all the scheduling classes).
Think about what can be done when combining this mechanism with
cgroups/containers :)

[...]
> > I understand this is because you do not
> > want to delay RT tasks if they are not starving other tasks. But
> > then, maybe what you want is not deadline-based scheduling. Maybe a
> > reservation-based scheduler based on fixed priorities is what you
> > want? (with SCHED_DEADLINE, you could provide exact performance
> > guarantees to SCHED_OTHER tasks, but I suspect patch 6/6 breaks
> > these guarantees?)
>
> I agree that we are not interested in exact guarantees in this case,
> but why not using something that it's already there and would give us
> the functionality we need (fix starvation for fair)?

Ok, if performance guarantees to non-RT tasks are not the goal, then I
agree. I was thinking that since the patchset provides a mechanism to
schedule various classes of tasks through deadline servers, then
using these servers to provide some kinds of guarantees could be
interesting ;-)



Thanks,
Luca

> It would also
> work well in presence of "real" deadline tasks I think, in that you
> could account for these fair servers while performing admission
> control.
>
> Best,
>
> Juri
>

2020-08-07 13:47:41

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On 07/08/20 15:28, luca abeni wrote:
> Hi Juri,
>
> On Fri, 7 Aug 2020 11:56:04 +0200
> Juri Lelli <[email protected]> wrote:
>
> > Starting deadline server for lower priority classes right away when
> > first task is enqueued might break guarantees
>
> Which guarantees are you thinking about, here? Response times of fixed
> priority tasks?

Response time, but also wakeup latency (which, for better or worse, is
another important metric).

> If fixed priority tasks are also scheduled through deadline servers,
> then you can provide response-time guarantees to them even when
> lower-priority/non-real-time tasks are scheduled through deadline
> servers.

Right, but I fear we won't be able to keep current behavior for wakeups:
RT with highest prio always gets scheduled right away?

2020-08-07 13:52:39

by luca abeni

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

Hi Peter,

On Fri, 7 Aug 2020 12:46:18 +0200
[email protected] wrote:

> On Fri, Aug 07, 2020 at 11:56:04AM +0200, Juri Lelli wrote:
> > Starting deadline server for lower priority classes right away when
> > first task is enqueued might break guarantees, as tasks belonging to
> > intermediate priority classes could be uselessly preempted. E.g., a
> > well behaving (non hog) FIFO task can be preempted by NORMAL tasks
> > even if there are still CPU cycles available for NORMAL tasks to
> > run, as they'll be running inside the fair deadline server for some
> > period of time.
> >
> > To prevent this issue, implement a starvation monitor mechanism that
> > starts the deadline server only if a (fair in this case) task hasn't
> > been scheduled for some interval of time after it has been enqueued.
> > Use pick/put functions to manage starvation monitor status.
>
> One thing I considerd was scheduling this as a least-laxity entity --
> such that it runs late, not early

Are you thinking about scheduling both RT and non-RT tasks through
deadline servers? If yes, then I think that using something like
laxity-based scheduling for the SCHED_OTHER server can be a good idea
(but then we need to understand how to combine deadline-based
scheduling with laxity-based scheduling, etc...)

Or are you thinking about keeping the SCHED_OTHER server throttled
until its laxity is 0 (or until its laxity is lower than some small
value)? In this second case, the approach would work even if RT tasks
are not scheduled through a server (but I do not know which kind of
performance guarantee we could provide).


> -- and start the server when
> rq->nr_running != rq->cfs.h_nr_running, IOW when there's !fair tasks
> around.

Yes, this could be a good optimization.



Luca
>
> Not saying we should do it like that, but that's perhaps more
> deterministic than this.

2020-08-07 13:56:29

by luca abeni

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On Fri, 7 Aug 2020 15:43:53 +0200
Juri Lelli <[email protected]> wrote:

> On 07/08/20 15:28, luca abeni wrote:
> > Hi Juri,
> >
> > On Fri, 7 Aug 2020 11:56:04 +0200
> > Juri Lelli <[email protected]> wrote:
> >
> > > Starting deadline server for lower priority classes right away
> > > when first task is enqueued might break guarantees
> >
> > Which guarantees are you thinking about, here? Response times of
> > fixed priority tasks?
>
> Response time, but also wakeup latency (which, for better or worse, is
> another important metric).
>
> > If fixed priority tasks are also scheduled through deadline servers,
> > then you can provide response-time guarantees to them even when
> > lower-priority/non-real-time tasks are scheduled through deadline
> > servers.
>
> Right, but I fear we won't be able to keep current behavior for
> wakeups: RT with highest prio always gets scheduled right away?

Uhm... I think this depends on how the servers' parameters are
designed: assigning "wrong" (or "bad") parameters to the server used to
schedule RT tasks, this property is broken.

(however, notice that even with the current patchset the highest
priority task might be scheduled with some delay --- if the SCHED_OTHER
deadline server is active because SCHED_OTHER tasks are being starved).



Luca

2020-08-07 14:07:59

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

On 07/08/20 15:41, luca abeni wrote:
> Hi Juri,
>
> On Fri, 7 Aug 2020 15:30:41 +0200
> Juri Lelli <[email protected]> wrote:
> [...]
> > > In the meanwhile, I have some questions/comments after a first quick
> > > look.
> > >
> > > If I understand well, the patchset does not apply deadline servers
> > > to FIFO and RR tasks, right? How does this patchset interact with RT
> > > throttling?
> >
> > Well, it's something like the dual of it, in that RT Throttling
> > directly throttles RT tasks to make spare CPU cycles available to
> > fair tasks while this patchset introduces deadline servers to
> > schedule fair tasks, thus still reserving CPU time for those (when
> > needed).
>
> Ah, OK... I was thinking about using deadline servers for both RT and
> non-RT tasks. And to use them not only to throttle, but also to provide
> some kind of performance guarantees (to all the scheduling classes).
> Think about what can be done when combining this mechanism with
> cgroups/containers :)
>
> [...]
> > > I understand this is because you do not
> > > want to delay RT tasks if they are not starving other tasks. But
> > > then, maybe what you want is not deadline-based scheduling. Maybe a
> > > reservation-based scheduler based on fixed priorities is what you
> > > want? (with SCHED_DEADLINE, you could provide exact performance
> > > guarantees to SCHED_OTHER tasks, but I suspect patch 6/6 breaks
> > > these guarantees?)
> >
> > I agree that we are not interested in exact guarantees in this case,
> > but why not using something that it's already there and would give us
> > the functionality we need (fix starvation for fair)?
>
> Ok, if performance guarantees to non-RT tasks are not the goal, then I
> agree. I was thinking that since the patchset provides a mechanism to
> schedule various classes of tasks through deadline servers, then
> using these servers to provide some kinds of guarantees could be
> interesting ;-)

Not saying that the wider scope approach is not worth pursuing, you know
I would be very much interested into that as well :-), but I'd leave it
for a later time. This proposal looks reasonably achieaveble in somewhat
short times and it should provide provable benefits to production today.
Plus, you are again right, foundations would be there already when we'll
be ready for the wider solution.

2020-08-07 14:14:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On Fri, Aug 07, 2020 at 03:49:41PM +0200, luca abeni wrote:
> Hi Peter,
>
> [email protected] wrote:

> > One thing I considerd was scheduling this as a least-laxity entity --
> > such that it runs late, not early
>
> Are you thinking about scheduling both RT and non-RT tasks through
> deadline servers? If yes,

Maybe, I initially considered this for mixed criticality, where the
'soft' class would run EDF and the 'hard' class would run LLF (or the
other way around, I can't quite remember how I figured it).

If you restrict the hard class to single CPU assignment (IOW the UP
case) and ensure that u_llf + U_gedf/N < 1, it should just work out.

But I shelved all that after I heard about that other balancer idea
Danial was suppose to be working on ;-)))

> then I think that using something like
> laxity-based scheduling for the SCHED_OTHER server can be a good idea
> (but then we need to understand how to combine deadline-based
> scheduling with laxity-based scheduling, etc...)

/me consults notes, EDZL is I think the closest thing there.

> Or are you thinking about keeping the SCHED_OTHER server throttled
> until its laxity is 0 (or until its laxity is lower than some small
> value)? In this second case, the approach would work even if RT tasks
> are not scheduled through a server (but I do not know which kind of
> performance guarantee we could provide).

That would certainly be sufficient for OTHER servers I think.

2020-08-07 14:15:04

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On 07/08/20 15:55, luca abeni wrote:
> On Fri, 7 Aug 2020 15:43:53 +0200
> Juri Lelli <[email protected]> wrote:
>
> > On 07/08/20 15:28, luca abeni wrote:
> > > Hi Juri,
> > >
> > > On Fri, 7 Aug 2020 11:56:04 +0200
> > > Juri Lelli <[email protected]> wrote:
> > >
> > > > Starting deadline server for lower priority classes right away
> > > > when first task is enqueued might break guarantees
> > >
> > > Which guarantees are you thinking about, here? Response times of
> > > fixed priority tasks?
> >
> > Response time, but also wakeup latency (which, for better or worse, is
> > another important metric).
> >
> > > If fixed priority tasks are also scheduled through deadline servers,
> > > then you can provide response-time guarantees to them even when
> > > lower-priority/non-real-time tasks are scheduled through deadline
> > > servers.
> >
> > Right, but I fear we won't be able to keep current behavior for
> > wakeups: RT with highest prio always gets scheduled right away?
>
> Uhm... I think this depends on how the servers' parameters are
> designed: assigning "wrong" (or "bad") parameters to the server used to
> schedule RT tasks, this property is broken.
>
> (however, notice that even with the current patchset the highest
> priority task might be scheduled with some delay --- if the SCHED_OTHER
> deadline server is active because SCHED_OTHER tasks are being starved).

But that's OK I think, because if the server is active it means that
OTHER didn't get a chance to run for some time and if it continues to
hung than worse problems than breaking FIFO assumptions will happen. :-/

2020-08-07 14:15:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On Fri, Aug 07, 2020 at 03:43:53PM +0200, Juri Lelli wrote:

> Right, but I fear we won't be able to keep current behavior for wakeups:
> RT with highest prio always gets scheduled right away?

If you consider RT throttling, that's already not the case. We can
consider this fair server to be just another way to implement that.

At some point, we'll have have to preempt higher priority tasks, that's
the entire point of the thing after all.

2020-08-07 14:17:44

by luca abeni

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

Hi Juri,

thanks for sharing the v2 patchset!

In the next days I'll have a look at it, and try some tests...

In the meanwhile, I have some questions/comments after a first quick
look.

If I understand well, the patchset does not apply deadline servers to
FIFO and RR tasks, right? How does this patchset interact with RT
throttling?

If I understand well, patch 6/6 does something like "use deadline
servers for SCHED_OTHER only if FIFO/RR tasks risk to starve
SCHED_OTHER tasks"... Right? I understand this is because you do not
want to delay RT tasks if they are not starving other tasks. But then,
maybe what you want is not deadline-based scheduling. Maybe a
reservation-based scheduler based on fixed priorities is what you want?
(with SCHED_DEADLINE, you could provide exact performance guarantees to
SCHED_OTHER tasks, but I suspect patch 6/6 breaks these guarantees?)


Thanks,
Luca

On Fri, 7 Aug 2020 11:50:45 +0200
Juri Lelli <[email protected]> wrote:

> Hi,
>
> This is RFC v2 of Peter's SCHED_DEADLINE server infrastructure
> implementation [1].
>
> SCHED_DEADLINE servers can help fixing starvation issues of low
> priority tasks (e.g., SCHED_OTHER) when higher priority tasks
> monopolize CPU cycles. Today we have RT Throttling; DEADLINE servers
> should be able to replace and improve that.
>
> I rebased Peter's patches (adding changelogs where needed) on
> tip/sched/core as of today and incorporated fixes to issues discussed
> during RFC v1. Current set seems to even boot on real HW! :-)
>
> While playing with RFC v1 set (and discussing it further offline with
> Daniel) it has emerged the need to slightly change the behavior. Patch
> 6/6 is a (cumbersome?) attempt to show what's probably needed.
> The problem with "original" implementation is that FIFO tasks might
> suffer preemption from NORMAL even when spare CPU cycles are
> available. In fact, fair deadline server is enqueued right away when
> NORMAL tasks wake up and they are first scheduled by the server, thus
> potentially preempting a well behaving FIFO task. This is of course
> not ideal. So, in patch 6/6 I propose to use some kind of starvation
> monitor/ watchdog that delays enqueuing of deadline servers to the
> point when fair tasks might start to actually suffer from starvation
> (just randomly picked HZ/2 for now). One problem I already see with
> the current implementation is that it adds overhead to fair paths, so
> I'm pretty sure there are better ways to implement the idea (e.g.,
> Daniel already suggested using a starvation monitor kthread sort of
> thing).
>
> Receiving comments and suggestions is the sole purpose of this posting
> at this stage. Hopefully we can further discuss the idea at Plumbers
> in a few weeks. So, please don't focus too much into actual
> implementation (which I plan to revise anyway after I'm back from pto
> :), but try to see if this might actually fly. The feature seems to
> be very much needed.
>
> Thanks!
>
> Juri
>
> 1 -
> https://lore.kernel.org/lkml/[email protected]/
>
> Juri Lelli (1):
> sched/fair: Implement starvation monitor
>
> Peter Zijlstra (5):
> sched: Unify runtime accounting across classes
> sched/deadline: Collect sched_dl_entity initialization
> sched/deadline: Move bandwidth accounting into
> {en,de}queue_dl_entity sched/deadline: Introduce deadline servers
> sched/fair: Add trivial fair server
>
> include/linux/sched.h | 28 ++-
> kernel/sched/core.c | 23 +-
> kernel/sched/deadline.c | 483
> ++++++++++++++++++++++++--------------- kernel/sched/fair.c |
> 136 ++++++++++- kernel/sched/rt.c | 17 +-
> kernel/sched/sched.h | 50 +++-
> kernel/sched/stop_task.c | 16 +-
> 7 files changed, 522 insertions(+), 231 deletions(-)
>

2020-08-07 14:18:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

On Fri, Aug 07, 2020 at 03:16:32PM +0200, luca abeni wrote:

> If I understand well, the patchset does not apply deadline servers to
> FIFO and RR tasks, right? How does this patchset interact with RT
> throttling?

ideally it will replace it ;-)

But of course, there's the whole cgroup trainwreck waiting there :/

2020-08-07 15:10:45

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On 07/08/20 16:13, [email protected] wrote:
> On Fri, Aug 07, 2020 at 03:43:53PM +0200, Juri Lelli wrote:
>
> > Right, but I fear we won't be able to keep current behavior for wakeups:
> > RT with highest prio always gets scheduled right away?
>
> If you consider RT throttling, that's already not the case. We can
> consider this fair server to be just another way to implement that.
>
> At some point, we'll have have to preempt higher priority tasks, that's
> the entire point of the thing after all.
>

Ah, indeed. But I was thinking we might try to do a little "better" wrt
to RT Throttling while we are at it. :-)

This proposal has the potential to be more flexible and to kick in only
when really needed.

Subject: Re: [RFC PATCH v2 6/6] sched/fair: Implement starvation monitor

On 8/7/20 4:11 PM, [email protected] wrote:
> But I shelved all that after I heard about that other balancer idea
> Danial was suppose to be working on ;-)))

The PhD bureaucracy (and behind the scenes) were blocking me... but I am free
man now and will catch up on that ;-).

[ also because I made enough progress on this:

https://github.com/bristot/rtsl/

and can start other things. ]

-- Daniel

2020-09-08 22:24:54

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

Hi!

> This is RFC v2 of Peter's SCHED_DEADLINE server infrastructure
> implementation [1].
>
> SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g.,
> SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT
> Throttling; DEADLINE servers should be able to replace and improve that.

It would be worth noting what "server" is in this context.

It is not white box with CPU inside, it is not even an userland process, afaict.

Subject is quite confusing.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2020-09-09 05:53:59

by Juri Lelli

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] SCHED_DEADLINE server infrastructure

Hi Pavel,

On 09/09/20 00:22, Pavel Machek wrote:
> Hi!
>
> > This is RFC v2 of Peter's SCHED_DEADLINE server infrastructure
> > implementation [1].
> >
> > SCHED_DEADLINE servers can help fixing starvation issues of low priority tasks (e.g.,
> > SCHED_OTHER) when higher priority tasks monopolize CPU cycles. Today we have RT
> > Throttling; DEADLINE servers should be able to replace and improve that.
>
> It would be worth noting what "server" is in this context.

It comes from Constant Bandwidth Server (CBS), that SCHED_DEADLINE is
implementing [1].

>
> It is not white box with CPU inside, it is not even an userland process, afaict.
>
> Subject is quite confusing.

Best,
Juri

1 - https://elixir.bootlin.com/linux/latest/source/Documentation/scheduler/sched-deadline.rst#L42