LinuxLists.cc - [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

2023-05-26 15:26:10

Subject: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

According to the GRUB[1] rule, the runtime is depreciated as:
"dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

To guarantee that deadline tasks doesn't starve lower class tasks,
we do not allocate the full bandwidth of the cpu to deadline tasks.
Maximum bandwidth usable by deadline tasks is denoted by "Umax".
Considering Umax, equation (1) becomes:
"dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Current implementation has a minor bug in equation (2), which this
patch fixes.

The reclamation logic is verified by a sample program which creates
multiple deadline threads and observing their utilization. The tests
were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

Tests on 6.3.0
==============

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

As seen above, the reclamation doesn't reclaim the maximum allowed
bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
bandwidth also comes down.

Tests with this patch applied
=============================

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

Running tasks on all cpus allowing for migration also showed that
the utilization is reclaimed to the maximum. Running 10 tasks on
3 cpus SCHED_FLAG_RECLAIM - top shows:
%Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
%Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
%Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
(2015). Parallel and sequential reclaiming in multicore
real-time global scheduling.

Signed-off-by: Vineeth Pillai (Google) <[email protected]>
---
kernel/sched/deadline.c | 56 +++++++++++++++++++----------------------
kernel/sched/sched.h | 6 +++++
2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 71b24371a6f7..ea5db080f008 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
}

/*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (1 - Uinact - Uextra)} / Umax) dt",
* where u is the utilization of the task, Umax is the maximum reclaimable
- * utilization, Uinact is the (per-runqueue) inactive utilization, computed
- * as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
- * reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * bandwidth, Uinact is the (per-runqueue) inactive utilization, computed as
+ * the difference between the "total runqueue utilization" and the runqueue
+ * active utilization, and Uextra is the (per runqueue) extra reclaimable
+ * utilization.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.i
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
*/
static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
{
- u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
u64 u_act;
- u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+ u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */

/*
- * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
- * we compare u_inact + rq->dl.extra_bw with
- * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
- * u_inact + rq->dl.extra_bw can be larger than
- * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
- * leading to wrong results)
+ * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+ * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+ * can be larger than u_max. So, u_max - u_inact - u_extra would be
+ * negative leading to wrong results.
*/
- if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
- u_act = u_act_min;
+ if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+ u_act = dl_se->dl_bw;
else
- u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+ u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;

+ u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
return (delta * u_act) >> BW_SHIFT;
}

@@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
{
if (global_rt_runtime() == RUNTIME_INF) {
dl_rq->bw_ratio = 1 << RATIO_SHIFT;
- dl_rq->extra_bw = 1 << BW_SHIFT;
+ dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
} else {
dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
- dl_rq->extra_bw = to_ratio(global_rt_period(),
- global_rt_runtime());
+ dl_rq->max_bw = dl_rq->extra_bw =
+ to_ratio(global_rt_period(), global_rt_runtime());
}
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e8df6d31c1e..73027c2806dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -753,6 +753,12 @@ struct dl_rq {
u64 this_bw;
u64 extra_bw;

+ /*
+ * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+ * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+ */
+ u64 max_bw;
+
/*
* Inverse of the fraction of CPU utilization that can be reclaimed
* by the GRUB algorithm.
--
2.40.1

2023-05-26 15:56:47

by luca abeni

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

Hi,

I think the code changes look good. I only see a small issue in the
comments.

On Fri, 26 May 2023 10:55:18 -0400
Vineeth Pillai <[email protected]> wrote:

> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

This is correct...

[...]
> /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to
> the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq =
> -dt",
> + * but as "dq = -(max{u, (1 - Uinact - Uextra)} / Umax) dt",

...But I think this is wrong (should be "Umax - ...", not "1 - ...").
I think patch 2/2 has the same issue.

[...]
> + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> + u_act = dl_se->dl_bw;
> else
> - u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;

This again is IMHO OK

Thanks,
Luca

2023-05-26 16:13:44

by Vineeth Remanan Pillai

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

On Fri, May 26, 2023 at 11:30 AM luca abeni <[email protected]> wrote:
> [...]
> > + * but as "dq = -(max{u, (1 - Uinact - Uextra)} / Umax) dt",
>
> ...But I think this is wrong (should be "Umax - ...", not "1 - ...").
> I think patch 2/2 has the same issue.
>
Oops sorry, I missed this. Will send the fixed patch as a
reply to the original v4.

Thanks,
Vineeth

2023-05-26 16:19:16

by Vineeth Remanan Pillai

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

Hi Luca,

>
> On Fri, May 26, 2023 at 11:30 AM luca abeni <[email protected]> wrote:
> > [...]
> > > + * but as "dq = -(max{u, (1 - Uinact - Uextra)} / Umax) dt",
> >
> > ...But I think this is wrong (should be "Umax - ...", not "1 - ...").
> > I think patch 2/2 has the same issue.
> >
> Oops sorry, I missed this. Will send the fixed patch as a
> reply to the original v4.
>
On looking again, I think the description is correct. That line
mentions the actual m-GRUB equation from the paper. And then the
comment explains why we use Umax(because of limiting the bandwidth
to RT capacity).

Does it seem correct to you?

Thanks,
Vineeth

2023-05-26 16:19:29

by Vineeth Remanan Pillai

[permalink] [raw]

Subject: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

According to the GRUB[1] rule, the runtime is depreciated as:
"dq = -max{u, (1 - Uinact - Uextra)} dt" (1)

To guarantee that deadline tasks doesn't starve lower class tasks,
we do not allocate the full bandwidth of the cpu to deadline tasks.
Maximum bandwidth usable by deadline tasks is denoted by "Umax".
Considering Umax, equation (1) becomes:
"dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Current implementation has a minor bug in equation (2), which this
patch fixes.

The reclamation logic is verified by a sample program which creates
multiple deadline threads and observing their utilization. The tests
were run on an isolated cpu(isolcpus=3) on a 4 cpu system.

Tests on 6.3.0
==============

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23

As seen above, the reclamation doesn't reclaim the maximum allowed
bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
bandwidth also comes down.

Tests with this patch applied
=============================

RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16

RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21

RUN 3: 2 tasks
Task 1: runtime=1ms, deadline=period=10ms
Task 2: runtime=1ms, deadline=period=100ms
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73

Running tasks on all cpus allowing for migration also showed that
the utilization is reclaimed to the maximum. Running 10 tasks on
3 cpus SCHED_FLAG_RECLAIM - top shows:
%Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
%Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
%Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa

[1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
(2015). Parallel and sequential reclaiming in multicore
real-time global scheduling.

Signed-off-by: Vineeth Pillai (Google) <[email protected]>
---
kernel/sched/deadline.c | 56 +++++++++++++++++++----------------------
kernel/sched/sched.h | 6 +++++
2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 71b24371a6f7..ea5db080f008 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
}

/*
- * This function implements the GRUB accounting rule:
- * according to the GRUB reclaiming algorithm, the runtime is
- * not decreased as "dq = -dt", but as
- * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
+ * This function implements the GRUB accounting rule. According to the
+ * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
+ * but as "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt",
* where u is the utilization of the task, Umax is the maximum reclaimable
- * utilization, Uinact is the (per-runqueue) inactive utilization, computed
- * as the difference between the "total runqueue utilization" and the
- * runqueue active utilization, and Uextra is the (per runqueue) extra
- * reclaimable utilization.
- * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
- * multiplied by 2^BW_SHIFT, the result has to be shifted right by
- * BW_SHIFT.
- * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
- * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
- * Since delta is a 64 bit variable, to have an overflow its value
- * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
- * So, overflow is not an issue here.
+ * bandwidth, Uinact is the (per-runqueue) inactive utilization, computed as
+ * the difference between the "total runqueue utilization" and the runqueue
+ * active utilization, and Uextra is the (per runqueue) extra reclaimable
+ * utilization.
+ * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
+ * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
+ * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
+ * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.i
+ * Since delta is a 64 bit variable, to have an overflow its value should be
+ * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
+ * not an issue here.
*/
static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
{
- u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
u64 u_act;
- u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
+ u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */

/*
- * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
- * we compare u_inact + rq->dl.extra_bw with
- * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
- * u_inact + rq->dl.extra_bw can be larger than
- * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
- * leading to wrong results)
+ * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
+ * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
+ * can be larger than u_max. So, u_max - u_inact - u_extra would be
+ * negative leading to wrong results.
*/
- if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
- u_act = u_act_min;
+ if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
+ u_act = dl_se->dl_bw;
else
- u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
+ u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;

+ u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
return (delta * u_act) >> BW_SHIFT;
}

@@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
{
if (global_rt_runtime() == RUNTIME_INF) {
dl_rq->bw_ratio = 1 << RATIO_SHIFT;
- dl_rq->extra_bw = 1 << BW_SHIFT;
+ dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
} else {
dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
- dl_rq->extra_bw = to_ratio(global_rt_period(),
- global_rt_runtime());
+ dl_rq->max_bw = dl_rq->extra_bw =
+ to_ratio(global_rt_period(), global_rt_runtime());
}
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3e8df6d31c1e..73027c2806dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -753,6 +753,12 @@ struct dl_rq {
u64 this_bw;
u64 extra_bw;

+ /*
+ * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
+ * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
+ */
+ u64 max_bw;
+
/*
* Inverse of the fraction of CPU utilization that can be reclaimed
* by the GRUB algorithm.
--
2.40.1

2023-05-26 16:23:33

by Vineeth Remanan Pillai

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

On Fri, May 26, 2023 at 11:58 AM Vineeth Remanan Pillai
<[email protected]> wrote:
>
> > > ...But I think this is wrong (should be "Umax - ...", not "1 - ...").
> > > I think patch 2/2 has the same issue.
> > >
> > Oops sorry, I missed this. Will send the fixed patch as a
> > reply to the original v4.
> >
> On looking again, I think the description is correct. That line
> mentions the actual m-GRUB equation from the paper. And then the
> comment explains why we use Umax(because of limiting the bandwidth
> to RT capacity).
>
Ahh my bad again :-(, I was looking at the commit message. I see the
issue in the code comment now.

I have just sent the fix as a reply to the initial patch. I shall send a
v5 if needed. Please let me know.

Thanks,
Vineeth

2023-05-30 13:28:24

by Daniel Bristot de Oliveira

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

On 5/26/23 18:15, Vineeth Remanan Pillai wrote:
> I have just sent the fix as a reply to the initial patch. I shall send a
> v5 if needed. Please let me know.

Please, send a v5... so we avoid mixing up patches on testing.

-- Daniel

2023-05-30 13:52:29

by Vineeth Remanan Pillai

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

> On 5/26/23 18:15, Vineeth Remanan Pillai wrote:
> > I have just sent the fix as a reply to the initial patch. I shall send a
> > v5 if needed. Please let me know.
>
> Please, send a v5... so we avoid mixing up patches on testing.
>
Sure, Will send it soon..

Thanks,
Vineeth

2023-06-02 19:27:23

by Joel Fernandes

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] sched/deadline: Fix bandwidth reclaim equation in GRUB

On Fri, May 26, 2023 at 10:55 AM Vineeth Pillai <[email protected]> wrote:
>
> According to the GRUB[1] rule, the runtime is depreciated as:
> "dq = -max{u, (1 - Uinact - Uextra)} dt" (1)
>
> To guarantee that deadline tasks doesn't starve lower class tasks,
> we do not allocate the full bandwidth of the cpu to deadline tasks.
> Maximum bandwidth usable by deadline tasks is denoted by "Umax".
> Considering Umax, equation (1) becomes:
> "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2)

Makes sense, your patch fixes the issue where we need the depreciation
ratio to factor in the throttling limits as well.

The code looks sane to me too.

Acked-by: Joel Fernandes (Google) <[email protected]>

thanks,

- Joel

>
> Current implementation has a minor bug in equation (2), which this
> patch fixes.
>
> The reclamation logic is verified by a sample program which creates
> multiple deadline threads and observing their utilization. The tests
> were run on an isolated cpu(isolcpus=3) on a 4 cpu system.
>
> Tests on 6.3.0
> ==============
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33
> TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
> TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37
> TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38
> TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23
>
> As seen above, the reclamation doesn't reclaim the maximum allowed
> bandwidth and as the bandwidth of tasks gets smaller, the reclaimed
> bandwidth also comes down.
>
> Tests with this patch applied
> =============================
>
> RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95%
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19
> TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16
>
> RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95%
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27
> TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21
>
> RUN 3: 2 tasks
> Task 1: runtime=1ms, deadline=period=10ms
> Task 2: runtime=1ms, deadline=period=100ms
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66
> TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45
> TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73
>
> Running tasks on all cpus allowing for migration also showed that
> the utilization is reclaimed to the maximum. Running 10 tasks on
> 3 cpus SCHED_FLAG_RECLAIM - top shows:
> %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa
> %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa
> %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa
>
> [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng.
> (2015). Parallel and sequential reclaiming in multicore
> real-time global scheduling.
>
> Signed-off-by: Vineeth Pillai (Google) <[email protected]>
> ---
> kernel/sched/deadline.c | 56 +++++++++++++++++++----------------------
> kernel/sched/sched.h | 6 +++++
> 2 files changed, 32 insertions(+), 30 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 71b24371a6f7..ea5db080f008 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se)
> }
>
> /*
> - * This function implements the GRUB accounting rule:
> - * according to the GRUB reclaiming algorithm, the runtime is
> - * not decreased as "dq = -dt", but as
> - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt",
> + * This function implements the GRUB accounting rule. According to the
> + * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt",
> + * but as "dq = -(max{u, (1 - Uinact - Uextra)} / Umax) dt",
> * where u is the utilization of the task, Umax is the maximum reclaimable
> - * utilization, Uinact is the (per-runqueue) inactive utilization, computed
> - * as the difference between the "total runqueue utilization" and the
> - * runqueue active utilization, and Uextra is the (per runqueue) extra
> - * reclaimable utilization.
> - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations
> - * multiplied by 2^BW_SHIFT, the result has to be shifted right by
> - * BW_SHIFT.
> - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT,
> - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.
> - * Since delta is a 64 bit variable, to have an overflow its value
> - * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds.
> - * So, overflow is not an issue here.
> + * bandwidth, Uinact is the (per-runqueue) inactive utilization, computed as
> + * the difference between the "total runqueue utilization" and the runqueue
> + * active utilization, and Uextra is the (per runqueue) extra reclaimable
> + * utilization.
> + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied
> + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT.
> + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw
> + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.i
> + * Since delta is a 64 bit variable, to have an overflow its value should be
> + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is
> + * not an issue here.
> */
> static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se)
> {
> - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
> u64 u_act;
> - u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT;
> + u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */
>
> /*
> - * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)},
> - * we compare u_inact + rq->dl.extra_bw with
> - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because
> - * u_inact + rq->dl.extra_bw can be larger than
> - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative
> - * leading to wrong results)
> + * Instead of computing max{u, (u_max - u_inact - u_extra)}, we
> + * compare u_inact + u_extra with u_max - u, because u_inact + u_extra
> + * can be larger than u_max. So, u_max - u_inact - u_extra would be
> + * negative leading to wrong results.
> */
> - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min)
> - u_act = u_act_min;
> + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw)
> + u_act = dl_se->dl_bw;
> else
> - u_act = BW_UNIT - u_inact - rq->dl.extra_bw;
> + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw;
>
> + u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT;
> return (delta * u_act) >> BW_SHIFT;
> }
>
> @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
> {
> if (global_rt_runtime() == RUNTIME_INF) {
> dl_rq->bw_ratio = 1 << RATIO_SHIFT;
> - dl_rq->extra_bw = 1 << BW_SHIFT;
> + dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT;
> } else {
> dl_rq->bw_ratio = to_ratio(global_rt_runtime(),
> global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT);
> - dl_rq->extra_bw = to_ratio(global_rt_period(),
> - global_rt_runtime());
> + dl_rq->max_bw = dl_rq->extra_bw =
> + to_ratio(global_rt_period(), global_rt_runtime());
> }
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3e8df6d31c1e..73027c2806dc 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -753,6 +753,12 @@ struct dl_rq {
> u64 this_bw;
> u64 extra_bw;
>
> + /*
> + * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM
> + * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB).
> + */
> + u64 max_bw;
> +
> /*
> * Inverse of the fraction of CPU utilization that can be reclaimed
> * by the GRUB algorithm.
> --
> 2.40.1
>