2023-05-30 14:21:27

by Vineeth Remanan Pillai

[permalink] [raw]
Subject: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

According to the GRUB[1] rule, the runtime is depreciated as:
"dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

To guarantee that deadline tasks doesn't starve lower class tasks,
we do not allocate the full bandwidth of the cpu to deadline tasks.
Maximum bandwidth usable by deadline tasks is denoted by "Umax".
Considering Umax, equation (1) becomes:
"dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Current implementation has a minor bug in equation (2), which this
patch fixes.

The reclamation logic is verified by a sample program which creates
multiple deadline threads and observing their utilization. The tests
were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

Tests on 6.3.0
==============

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

As seen above, the reclamation doesn't reclaim the maximum allowed
bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
bandwidth also comes down.

Tests with this patch applied
=============================

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

Running tasks on all cpus allowing for migration also showed that
the utilization is reclaimed to the maximum. Running 10 tasks on
3 cpus SCHED_FLAG_RECLAIM - top shows:
%Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
%Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
%Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
(2015). Parallel and sequential reclaiming in multicore
real-time global scheduling.

Signed-off-by: Vineeth Pillai (Google) <[email protected]>
---
kernel/sched/deadline.c | 50 +++++++++++++++++++----------------------
kernel/sched/sched.h | 6 +++++
2 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 71b24371a6f7..dfb59a363560 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
}

/*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
* where u is the utilization of the task, Umax is the maximum reclaimable
* utilization, Uinact is the (per-runqueue) inactive utilization, computed
* as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
+ * "runqueue active utilization", and Uextra is the (per runqueue) extra
* reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
*/
static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
{
- u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
u64 u_act;
- u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+ u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */

/*
- * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
- * we compare u_inact + rq->dl.extra_bw with
- * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
- * u_inact + rq->dl.extra_bw can be larger than
- * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
- * leading to wrong results)
+ * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+ * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+ * can be larger than u_max. So, u_max - u_inact - u_extra would be
+ * negative leading to wrong results.
*/
- if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
- u_act = u_act_min;
+ if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+ u_act = dl_se->dl_bw;
else
- u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+ u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;

+ u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
return (delta * u_act) >> BW_SHIFT;
}

@@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
{
if (global_rt_runtime() == RUNTIME_INF) {
dl_rq->bw_ratio = 1 << RATIO_SHIFT;
- dl_rq->extra_bw = 1 << BW_SHIFT;
+ dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
} else {
dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
- dl_rq->extra_bw = to_ratio(global_rt_period(),
- global_rt_runtime());
+ dl_rq->max_bw = dl_rq->extra_bw =
+ to_ratio(global_rt_period(), global_rt_runtime());
}
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e8df6d31c1e..73027c2806dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -753,6 +753,12 @@ struct dl_rq {
u64 this_bw;
u64 extra_bw;

+ /*
+ * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+ * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+ */
+ u64 max_bw;
+
/*
* Inverse of the fraction of CPU utilization that can be reclaimed
* by the GRUB algorithm.
--
2.40.1



2023-05-30 14:21:37

by luca abeni

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

I think this patch is OK


Thanks,
Luca

On Tue, 30 May 2023 09:55:25 -0400
Vineeth Pillai <[email protected]> wrote:

> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.
>
> Signed-off-by: Vineeth Pillai (Google) <[email protected]>
> ---
> kernel/sched/deadline.c | 50
> +++++++++++++++++++---------------------- kernel/sched/sched.h |
> 6 +++++ 2 files changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct
> sched_dl_entity *dl_se) }
>
> /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to
> the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq =
> -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
> * where u is the utilization of the task, Umax is the maximum
> reclaimable
> * utilization, Uinact is the (per-runqueue) inactive utilization,
> computed
> * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue)
> extra
> + * "runqueue active utilization", and Uextra is the (per runqueue)
> extra
> * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by
> RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64
> seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by
> 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> + * Since delta is a 64 bit variable, to have an overflow its value
> should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So,
> overflow is
> + * not an issue here.
> */
> static u64 grub_reclaim(u64 delta, struct rq *rq, struct
> sched_dl_entity *dl_se) {
> - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */ u64 u_act;
> - u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >>
> RATIO_SHIFT;
> + u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot -
> Uact */
> /*
> - * Instead of computing max{u * bw_ratio, (1 - u_inact -
> u_extra)},
> - * we compare u_inact + rq->dl.extra_bw with
> - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> - * u_inact + rq->dl.extra_bw can be larger than
> - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> - * leading to wrong results)
> + * Instead of computing max{u, (u_max - u_inact - u_extra)},
> we
> + * compare u_inact + u_extra with u_max - u, because u_inact
> + u_extra
> + * can be larger than u_max. So, u_max - u_inact - u_extra
> would be
> + * negative leading to wrong results.
> */
> - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> - u_act = u_act_min;
> + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> + u_act = dl_se->dl_bw;
> else
> - u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>
> + u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
> return (delta * u_act) >> BW_SHIFT;
> }
>
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq
> *dl_rq) {
> if (global_rt_runtime() == RUNTIME_INF) {
> dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> - dl_rq->extra_bw = 1 << BW_SHIFT;
> + dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
> } else {
> dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
> global_rt_period()) >> (BW_SHIFT -
> RATIO_SHIFT);
> - dl_rq->extra_bw = to_ratio(global_rt_period(),
> -
> global_rt_runtime());
> + dl_rq->max_bw = dl_rq->extra_bw =
> + to_ratio(global_rt_period(),
> global_rt_runtime()); }
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
> u64 this_bw;
> u64 extra_bw;
>
> + /*
> + * Maximum available bandwidth for reclaiming by
> SCHED_FLAG_RECLAIM
> + * tasks of this rq. Used in calculation of reclaimable
> bandwidth(GRUB).
> + */
> + u64 max_bw;
> +
> /*
> * Inverse of the fraction of CPU utilization that can be
> reclaimed
> * by the GRUB algorithm.


2023-05-30 14:22:17

by Vineeth Remanan Pillai

[permalink] [raw]
Subject: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation

Update the details of GRUB to reflect the updated logic.

Signed-off-by: Vineeth Pillai (Google) <[email protected]>
---
Documentation/scheduler/sched-deadline.rst | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index 9d9be52f221a..9fe4846079bb 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -203,12 +203,15 @@ Deadline Task Scheduling
- Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
runqueue, including the tasks in Inactive state.

+ - Maximum usable bandwidth (max_bw): This is the maximum bandwidth usable by
+ deadline tasks and is currently set to the RT capacity.
+

The algorithm reclaims the bandwidth of the tasks in Inactive state.
It does so by decrementing the runtime of the executing task Ti at a pace equal
to

- dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+ dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt

where:

--
2.40.1


Subject: Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

On 5/30/23 15:55, Vineeth Pillai wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.

So, I did some testing, mainly to validate the "one task cannot run on
two CPUs," that is, a case in which a higher utilization task would
always have its % of CPU available, even in the presence of low utilization
trying to reclaim all the CPU time. E.g.,

Task 1: runtime=1ms, deadline=period=10ms with reclaim
Task 2: runtime=1ms, deadline=period=10ms with reclaim
Task 3: runtime 8ms, deadline=period=10ms without reclaim

On two CPUs task 3 always have 80% available... the other tasks
do not get 95% because of migrations.

With 3+ cpus, the tasks can run up to 95% because there are CPUs to
everyone.

I played with migrate disable to force timelines... and yet, I failed
to break things so... we are good :-).

Reviewed-by: Daniel Bristot de Oliveira <[email protected]>

Thanks!
-- Daniel


Subject: Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation

On 5/30/23 15:55, Vineeth Pillai wrote:
> Update the details of GRUB to reflect the updated logic.
>
> Signed-off-by: Vineeth Pillai (Google) <[email protected]>

Reviewed-by: Daniel Bristot de Oliveira <[email protected]>

Thanks!
-- Daniel


2023-06-01 12:11:06

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

On 30/05/2023 15:55, Vineeth Pillai wrote:

[...]

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..dfb59a363560 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
> }
>
> /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
> + * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
> * where u is the utilization of the task, Umax is the maximum reclaimable
> * utilization, Uinact is the (per-runqueue) inactive utilization, computed
> * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue) extra
> + * "runqueue active utilization", and Uextra is the (per runqueue) extra
> * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
> - * So, overflow is not an issue here.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.

nit-pick:

s/multiped/multiplied

> + * Since delta is a 64 bit variable, to have an overflow its value should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
> + * not an issue here.
> */
> static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
> {
> - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
> u64 u_act;
> - u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
> + u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
>
> /*
> - * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
> - * we compare u_inact + rq->dl.extra_bw with
> - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> - * u_inact + rq->dl.extra_bw can be larger than
> - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> - * leading to wrong results)
> + * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
> + * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
> + * can be larger than u_max. So, u_max - u_inact - u_extra would be
> + * negative leading to wrong results.
> */
> - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> - u_act = u_act_min;
> + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> + u_act = dl_se->dl_bw;
> else
> - u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>
> + u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
> return (delta * u_act) >> BW_SHIFT;
> }
>
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
> {
> if (global_rt_runtime() == RUNTIME_INF) {
> dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> - dl_rq->extra_bw = 1 << BW_SHIFT;
> + dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
> } else {
> dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
> global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
> - dl_rq->extra_bw = to_ratio(global_rt_period(),
> - global_rt_runtime());
> + dl_rq->max_bw = dl_rq->extra_bw =
> + to_ratio(global_rt_period(), global_rt_runtime());
> }
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
> u64 this_bw;
> u64 extra_bw;
>
> + /*
> + * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
> + * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
> + */
> + u64 max_bw;
> +
> /*
> * Inverse of the fraction of CPU utilization that can be reclaimed
> * by the GRUB algorithm.

Not related to this patch directly but I still can't see how `GRUB-PA`
with schedutil CPUfreq governor should work together with GRUB reclaiming.

CPU frequency is influenced by Uact (rq->dl.running_bw):

sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->

return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT

But the extra GRUB reclaim runtime is calculated based on rq->dl.max_bw
and AFAICS, Uact is not adjusted by scaled_delta_exec?


2023-06-01 13:15:14

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

Hi!

On 30/05/23 09:55, Vineeth Pillai wrote:
> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)
>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.
>
> Signed-off-by: Vineeth Pillai (Google) <[email protected]>
> ---

This looks good to me too. Thanks a lot for working on this and of
course to Luca and Daniel who reviewed and played with it as well.

Acked-by: Juri Lelli <[email protected]>

Best,
Juri


2023-06-01 13:20:01

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] sched/deadline: Update GRUB description in the documentation

On 30/05/23 09:55, Vineeth Pillai wrote:
> Update the details of GRUB to reflect the updated logic.
>
> Signed-off-by: Vineeth Pillai (Google) <[email protected]>
> ---

Acked-by: Juri Lelli <[email protected]>

Best,
Juri


2023-06-05 02:49:08

by Vineeth Remanan Pillai

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

Hi Dietmar,

Sorry for my late response..

> > + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> > + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
>
> nit-pick:
>
> s/multiped/multiplied
>
Sorry I missed this. I am working on couple more GRUB fixes and will fix
this along with those changes.

> Not related to this patch directly but I still can't see how `GRUB-PA`
> with schedutil CPUfreq governor should work together with GRUB reclaiming.
>
> CPU frequency is influenced by Uact (rq->dl.running_bw):
>
> sugov_get_util() -> effective_cpu_util() -> cpu_bw_dl() ->
>
> return rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT
>
> But the extra GRUB reclaim runtime is calculated based on rq->dl.max_bw
> and AFAICS, Uact is not adjusted by scaled_delta_exec?
>
I haven't had a chance to look at GRUB-PA till now and after reading
your mail, I had a quick read of GRUB-PA paper[1] :-). From what I
understood, the dea of GRUB-PA is not to reclaim to the maximum
allowable bandwidth, but to scale down the frequency to the required
utilization(running_bw). This way the tasks run longer(similar to
reclaiming) but using less power.

GRUB reclaim is calculated indirectly based on running_bw as well as
"Uinact = this_bw - running_bw". We factor in reserved and running bw
into the equation and then scale it down to max_bw. So if my
understanding is correct, GRUB-PA doesn't technically reclaim extra
processing capacity, but just allows the task to run longer at lower
frequency so as to save power. I might be wrong and not an expert
here. Luca, please correct me if I am wrong.

Thanks,
Vineeth

[1] C. Scordino, G. Lipari, A Resource Reservation Algorithm for Power-Aware
Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions
on Computers, December 2006.

Subject: [tip: sched/core] sched/deadline: Fix bandwidth reclaim equation in GRUB

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 6a9d623aad89539eca71eb264db6b9d538620ad5
Gitweb: https://git.kernel.org/tip/6a9d623aad89539eca71eb264db6b9d538620ad5
Author: Vineeth Pillai <[email protected]>
AuthorDate: Tue, 30 May 2023 09:55:25 -04:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 16 Jun 2023 22:08:11 +02:00

sched/deadline: Fix bandwidth reclaim equation in GRUB

According to the GRUB[1] rule, the runtime is depreciated as:
"dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

To guarantee that deadline tasks doesn't starve lower class tasks,
we do not allocate the full bandwidth of the cpu to deadline tasks.
Maximum bandwidth usable by deadline tasks is denoted by "Umax".
Considering Umax, equation (1) becomes:
"dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Current implementation has a minor bug in equation (2), which this
patch fixes.

The reclamation logic is verified by a sample program which creates
multiple deadline threads and observing their utilization. The tests
were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

Tests on 6.3.0
==============

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

As seen above, the reclamation doesn't reclaim the maximum allowed
bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
bandwidth also comes down.

Tests with this patch applied
=============================

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

Running tasks on all cpus allowing for migration also showed that
the utilization is reclaimed to the maximum. Running 10 tasks on
3 cpus SCHED_FLAG_RECLAIM - top shows:
%Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
%Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
%Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
(2015). Parallel and sequential reclaiming in multicore
real-time global scheduling.

Signed-off-by: Vineeth Pillai (Google) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/deadline.c | 50 ++++++++++++++++++----------------------
kernel/sched/sched.h | 6 +++++-
2 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f827067..e41a36b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1253,43 +1253,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
}

/*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
* where u is the utilization of the task, Umax is the maximum reclaimable
* utilization, Uinact is the (per-runqueue) inactive utilization, computed
* as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
+ * "runqueue active utilization", and Uextra is the (per runqueue) extra
* reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
*/
static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
{
- u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
u64 u_act;
- u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+ u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */

/*
- * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
- * we compare u_inact + rq->dl.extra_bw with
- * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
- * u_inact + rq->dl.extra_bw can be larger than
- * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
- * leading to wrong results)
+ * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+ * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+ * can be larger than u_max. So, u_max - u_inact - u_extra would be
+ * negative leading to wrong results.
*/
- if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
- u_act = u_act_min;
+ if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+ u_act = dl_se->dl_bw;
else
- u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+ u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;

+ u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
return (delta * u_act) >> BW_SHIFT;
}

@@ -2788,12 +2784,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
{
if (global_rt_runtime() == RUNTIME_INF) {
dl_rq->bw_ratio = 1 << RATIO_SHIFT;
- dl_rq->extra_bw = 1 << BW_SHIFT;
+ dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
} else {
dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
- dl_rq->extra_bw = to_ratio(global_rt_period(),
- global_rt_runtime());
+ dl_rq->max_bw = dl_rq->extra_bw =
+ to_ratio(global_rt_period(), global_rt_runtime());
}
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 556496c..36e23e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -748,6 +748,12 @@ struct dl_rq {
u64 extra_bw;

/*
+ * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+ * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+ */
+ u64 max_bw;
+
+ /*
* Inverse of the fraction of CPU utilization that can be reclaimed
* by the GRUB algorithm.
*/

Subject: [tip: sched/core] sched/deadline: Update GRUB description in the documentation

The following commit has been merged into the sched/core branch of tip:

Commit-ID: e20f204c88d595c04fc9197794bb68c0fbabd902
Gitweb: https://git.kernel.org/tip/e20f204c88d595c04fc9197794bb68c0fbabd902
Author: Vineeth Pillai <[email protected]>
AuthorDate: Tue, 30 May 2023 09:55:26 -04:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 16 Jun 2023 22:08:12 +02:00

sched/deadline: Update GRUB description in the documentation

Update the details of GRUB to reflect the updated logic.

Signed-off-by: Vineeth Pillai (Google) <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
Documentation/scheduler/sched-deadline.rst | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/scheduler/sched-deadline.rst
index 9d9be52..9fe4846 100644
--- a/Documentation/scheduler/sched-deadline.rst
+++ b/Documentation/scheduler/sched-deadline.rst
@@ -203,12 +203,15 @@ Deadline Task Scheduling
- Total bandwidth (this_bw): this is the sum of all tasks "belonging" to the
runqueue, including the tasks in Inactive state.

+ - Maximum usable bandwidth (max_bw): This is the maximum bandwidth usable by
+ deadline tasks and is currently set to the RT capacity.
+

The algorithm reclaims the bandwidth of the tasks in Inactive state.
It does so by decrementing the runtime of the executing task Ti at a pace equal
to

- dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt
+ dq = -(max{ Ui, (Umax - Uinact - Uextra) } / Umax) dt

where: