2021-08-20 01:06:13

by Josh Don

[permalink] [raw]
Subject: [PATCH v3 0/4] SCHED_IDLE extensions

This patch series contains improvements/extensions for SCHED_IDLE.

The first patch of the series is the previously mailed patch to add
cgroup support for SCHED_IDLE.

The second patch adds some additional idle accounting.

The third and fourth patches change some idle interactions.

Josh Don (4):
sched: cgroup SCHED_IDLE support
sched: account number of SCHED_IDLE entities on each cfs_rq
sched: reduce sched slice for SCHED_IDLE entities
sched: adjust sleeper credit for SCHED_IDLE entities

kernel/sched/core.c | 25 +++++
kernel/sched/debug.c | 7 ++
kernel/sched/fair.c | 256 +++++++++++++++++++++++++++++++++++++------
kernel/sched/sched.h | 10 ++
4 files changed, 267 insertions(+), 31 deletions(-)

--
2.33.0.rc2.250.ged5fa647cd-goog


2021-08-20 01:07:44

by Josh Don

[permalink] [raw]
Subject: [PATCH v3 4/4] sched: adjust sleeper credit for SCHED_IDLE entities

Give reduced sleeper credit to SCHED_IDLE entities. As a result, woken
SCHED_IDLE entities will take longer to preempt normal entities.

The benefit of this change is to make it less likely that a newly woken
SCHED_IDLE entity will preempt a short-running normal entity before it
blocks.

We still give a small sleeper credit to SCHED_IDLE entities, so that
idle<->idle competition retains some fairness.

Example: With HZ=1000, spawned four threads affined to one cpu, one of
which was set to SCHED_IDLE. Without this patch, wakeup latency for the
SCHED_IDLE thread was ~1-2ms, with the patch the wakeup latency was
~5ms.

Signed-off-by: Josh Don <[email protected]>
---
kernel/sched/fair.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 31f40aa005b9..aa9c046d2aab 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4230,7 +4230,12 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)

/* sleeps up to a single latency don't count. */
if (!initial) {
- unsigned long thresh = sysctl_sched_latency;
+ unsigned long thresh;
+
+ if (se_is_idle(se))
+ thresh = sysctl_sched_min_granularity;
+ else
+ thresh = sysctl_sched_latency;

/*
* Halve their sleep time's effect, to allow
--
2.33.0.rc2.250.ged5fa647cd-goog

2021-08-20 01:09:23

by Josh Don

[permalink] [raw]
Subject: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

Use a small, non-scaled min granularity for SCHED_IDLE entities, when
competing with normal entities. This reduces the latency of getting
a normal entity back on cpu, at the expense of increased context
switch frequency of SCHED_IDLE entities.

The benefit of this change is to reduce the round-robin latency for
normal entities when competing with a SCHED_IDLE entity.

Example: on a machine with HZ=1000, spawned two threads, one of which is
SCHED_IDLE, and affined to one cpu. Without this patch, the SCHED_IDLE
thread runs for 4ms then waits for 1.4s. With this patch, it runs for
1ms and waits 340ms (as it round-robins with the other thread).

Signed-off-by: Josh Don <[email protected]>
---
kernel/sched/debug.c | 2 ++
kernel/sched/fair.c | 29 ++++++++++++++++++++++++-----
kernel/sched/sched.h | 1 +
3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 33538579db9a..317ef560aa63 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -305,6 +305,7 @@ static __init int sched_init_debug(void)

debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
+ debugfs_create_u32("idle_min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_idle_min_granularity);
debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);

debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
@@ -806,6 +807,7 @@ static void sched_debug_header(struct seq_file *m)
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
PN(sysctl_sched_min_granularity);
+ PN(sysctl_sched_idle_min_granularity);
PN(sysctl_sched_wakeup_granularity);
P(sysctl_sched_child_runs_first);
P(sysctl_sched_features);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 19a9244c140f..31f40aa005b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -59,6 +59,14 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
unsigned int sysctl_sched_min_granularity = 750000ULL;
static unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;

+/*
+ * Minimal preemption granularity for CPU-bound SCHED_IDLE tasks.
+ * Applies only when SCHED_IDLE tasks compete with normal tasks.
+ *
+ * (default: 0.75 msec)
+ */
+unsigned int sysctl_sched_idle_min_granularity = 750000ULL;
+
/*
* This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
*/
@@ -665,6 +673,8 @@ static u64 __sched_period(unsigned long nr_running)
return sysctl_sched_latency;
}

+static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
+
/*
* We calculate the wall-time slice from the period by taking a part
* proportional to the weight.
@@ -674,6 +684,8 @@ static u64 __sched_period(unsigned long nr_running)
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
unsigned int nr_running = cfs_rq->nr_running;
+ struct sched_entity *init_se = se;
+ unsigned int min_gran;
u64 slice;

if (sched_feat(ALT_PERIOD))
@@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
for_each_sched_entity(se) {
struct load_weight *load;
struct load_weight lw;
+ struct cfs_rq *qcfs_rq;

- cfs_rq = cfs_rq_of(se);
- load = &cfs_rq->load;
+ qcfs_rq = cfs_rq_of(se);
+ load = &qcfs_rq->load;

if (unlikely(!se->on_rq)) {
- lw = cfs_rq->load;
+ lw = qcfs_rq->load;

update_load_add(&lw, se->load.weight);
load = &lw;
@@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
slice = __calc_delta(slice, se->load.weight, load);
}

- if (sched_feat(BASE_SLICE))
- slice = max(slice, (u64)sysctl_sched_min_granularity);
+ if (sched_feat(BASE_SLICE)) {
+ if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
+ min_gran = sysctl_sched_idle_min_granularity;
+ else
+ min_gran = sysctl_sched_min_granularity;
+
+ slice = max_t(u64, slice, min_gran);
+ }

return slice;
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6af039e433fb..29846da35861 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2399,6 +2399,7 @@ extern const_debug unsigned int sysctl_sched_migration_cost;
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_latency;
extern unsigned int sysctl_sched_min_granularity;
+extern unsigned int sysctl_sched_idle_min_granularity;
extern unsigned int sysctl_sched_wakeup_granularity;
extern int sysctl_resched_latency_warn_ms;
extern int sysctl_resched_latency_warn_once;
--
2.33.0.rc2.250.ged5fa647cd-goog

2021-08-23 10:09:50

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

On Fri, 20 Aug 2021 at 03:04, Josh Don <[email protected]> wrote:
>
> Use a small, non-scaled min granularity for SCHED_IDLE entities, when
> competing with normal entities. This reduces the latency of getting
> a normal entity back on cpu, at the expense of increased context
> switch frequency of SCHED_IDLE entities.
>
> The benefit of this change is to reduce the round-robin latency for
> normal entities when competing with a SCHED_IDLE entity.
>
> Example: on a machine with HZ=1000, spawned two threads, one of which is
> SCHED_IDLE, and affined to one cpu. Without this patch, the SCHED_IDLE
> thread runs for 4ms then waits for 1.4s. With this patch, it runs for
> 1ms and waits 340ms (as it round-robins with the other thread).
>
> Signed-off-by: Josh Don <[email protected]>
> ---
> kernel/sched/debug.c | 2 ++
> kernel/sched/fair.c | 29 ++++++++++++++++++++++++-----
> kernel/sched/sched.h | 1 +
> 3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 33538579db9a..317ef560aa63 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -305,6 +305,7 @@ static __init int sched_init_debug(void)
>
> debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
> debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
> + debugfs_create_u32("idle_min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_idle_min_granularity);
> debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);
>
> debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
> @@ -806,6 +807,7 @@ static void sched_debug_header(struct seq_file *m)
> SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
> PN(sysctl_sched_latency);
> PN(sysctl_sched_min_granularity);
> + PN(sysctl_sched_idle_min_granularity);
> PN(sysctl_sched_wakeup_granularity);
> P(sysctl_sched_child_runs_first);
> P(sysctl_sched_features);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 19a9244c140f..31f40aa005b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -59,6 +59,14 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
> unsigned int sysctl_sched_min_granularity = 750000ULL;
> static unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;
>
> +/*
> + * Minimal preemption granularity for CPU-bound SCHED_IDLE tasks.
> + * Applies only when SCHED_IDLE tasks compete with normal tasks.
> + *
> + * (default: 0.75 msec)
> + */
> +unsigned int sysctl_sched_idle_min_granularity = 750000ULL;
> +
> /*
> * This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
> */
> @@ -665,6 +673,8 @@ static u64 __sched_period(unsigned long nr_running)
> return sysctl_sched_latency;
> }
>
> +static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
> +
> /*
> * We calculate the wall-time slice from the period by taking a part
> * proportional to the weight.
> @@ -674,6 +684,8 @@ static u64 __sched_period(unsigned long nr_running)
> static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> unsigned int nr_running = cfs_rq->nr_running;
> + struct sched_entity *init_se = se;
> + unsigned int min_gran;
> u64 slice;
>
> if (sched_feat(ALT_PERIOD))
> @@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> for_each_sched_entity(se) {
> struct load_weight *load;
> struct load_weight lw;
> + struct cfs_rq *qcfs_rq;
>
> - cfs_rq = cfs_rq_of(se);
> - load = &cfs_rq->load;
> + qcfs_rq = cfs_rq_of(se);
> + load = &qcfs_rq->load;
>
> if (unlikely(!se->on_rq)) {
> - lw = cfs_rq->load;
> + lw = qcfs_rq->load;
>
> update_load_add(&lw, se->load.weight);
> load = &lw;
> @@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> slice = __calc_delta(slice, se->load.weight, load);
> }
>
> - if (sched_feat(BASE_SLICE))
> - slice = max(slice, (u64)sysctl_sched_min_granularity);
> + if (sched_feat(BASE_SLICE)) {
> + if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))

Like for place_entity, we should probably not dynamically switch
between the 2 values below depending on the presence or not of non
sched idle tasks and always use sysctl_sched_idle_min_granularity


> + min_gran = sysctl_sched_idle_min_granularity;
> + else
> + min_gran = sysctl_sched_min_granularity;
> +
> + slice = max_t(u64, slice, min_gran);
> + }
>
> return slice;
> }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 6af039e433fb..29846da35861 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2399,6 +2399,7 @@ extern const_debug unsigned int sysctl_sched_migration_cost;
> #ifdef CONFIG_SCHED_DEBUG
> extern unsigned int sysctl_sched_latency;
> extern unsigned int sysctl_sched_min_granularity;
> +extern unsigned int sysctl_sched_idle_min_granularity;
> extern unsigned int sysctl_sched_wakeup_granularity;
> extern int sysctl_resched_latency_warn_ms;
> extern int sysctl_resched_latency_warn_once;
> --
> 2.33.0.rc2.250.ged5fa647cd-goog
>

2021-08-23 10:13:52

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] sched: adjust sleeper credit for SCHED_IDLE entities

On Fri, 20 Aug 2021 at 03:04, Josh Don <[email protected]> wrote:
>
> Give reduced sleeper credit to SCHED_IDLE entities. As a result, woken
> SCHED_IDLE entities will take longer to preempt normal entities.
>
> The benefit of this change is to make it less likely that a newly woken
> SCHED_IDLE entity will preempt a short-running normal entity before it
> blocks.
>
> We still give a small sleeper credit to SCHED_IDLE entities, so that
> idle<->idle competition retains some fairness.
>
> Example: With HZ=1000, spawned four threads affined to one cpu, one of
> which was set to SCHED_IDLE. Without this patch, wakeup latency for the
> SCHED_IDLE thread was ~1-2ms, with the patch the wakeup latency was
> ~5ms.
>
> Signed-off-by: Josh Don <[email protected]>

Reviewed-by: Vincent Guittot <[email protected]>

> ---
> kernel/sched/fair.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 31f40aa005b9..aa9c046d2aab 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4230,7 +4230,12 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>
> /* sleeps up to a single latency don't count. */
> if (!initial) {
> - unsigned long thresh = sysctl_sched_latency;
> + unsigned long thresh;
> +
> + if (se_is_idle(se))
> + thresh = sysctl_sched_min_granularity;
> + else
> + thresh = sysctl_sched_latency;
>
> /*
> * Halve their sleep time's effect, to allow
> --
> 2.33.0.rc2.250.ged5fa647cd-goog
>

2021-08-23 17:44:23

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

On Mon, Aug 23, 2021 at 3:08 AM Vincent Guittot
<[email protected]> wrote:
>
> On Fri, 20 Aug 2021 at 03:04, Josh Don <[email protected]> wrote:
> >
> > @@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > for_each_sched_entity(se) {
> > struct load_weight *load;
> > struct load_weight lw;
> > + struct cfs_rq *qcfs_rq;
> >
> > - cfs_rq = cfs_rq_of(se);
> > - load = &cfs_rq->load;
> > + qcfs_rq = cfs_rq_of(se);
> > + load = &qcfs_rq->load;
> >
> > if (unlikely(!se->on_rq)) {
> > - lw = cfs_rq->load;
> > + lw = qcfs_rq->load;
> >
> > update_load_add(&lw, se->load.weight);
> > load = &lw;
> > @@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > slice = __calc_delta(slice, se->load.weight, load);
> > }
> >
> > - if (sched_feat(BASE_SLICE))
> > - slice = max(slice, (u64)sysctl_sched_min_granularity);
> > + if (sched_feat(BASE_SLICE)) {
> > + if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
>
> Like for place_entity, we should probably not dynamically switch
> between the 2 values below depending on the presence or not of non
> sched idle tasks and always use sysctl_sched_idle_min_granularity

My reasoning here is that sched_slice is something we reasonably
expect to change as tasks enqueue/dequeue, and unlike place_entity()
it does not create fairness issues by messing with vruntime.
Additionally, it would be preferable to use the larger min granularity
on a cpu running only idle tasks.

2021-08-24 07:58:07

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

On Mon, 23 Aug 2021 at 19:40, Josh Don <[email protected]> wrote:
>
> On Mon, Aug 23, 2021 at 3:08 AM Vincent Guittot
> <[email protected]> wrote:
> >
> > On Fri, 20 Aug 2021 at 03:04, Josh Don <[email protected]> wrote:
> > >
> > > @@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > for_each_sched_entity(se) {
> > > struct load_weight *load;
> > > struct load_weight lw;
> > > + struct cfs_rq *qcfs_rq;
> > >
> > > - cfs_rq = cfs_rq_of(se);
> > > - load = &cfs_rq->load;
> > > + qcfs_rq = cfs_rq_of(se);
> > > + load = &qcfs_rq->load;
> > >
> > > if (unlikely(!se->on_rq)) {
> > > - lw = cfs_rq->load;
> > > + lw = qcfs_rq->load;
> > >
> > > update_load_add(&lw, se->load.weight);
> > > load = &lw;
> > > @@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > slice = __calc_delta(slice, se->load.weight, load);
> > > }
> > >
> > > - if (sched_feat(BASE_SLICE))
> > > - slice = max(slice, (u64)sysctl_sched_min_granularity);
> > > + if (sched_feat(BASE_SLICE)) {
> > > + if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
> >
> > Like for place_entity, we should probably not dynamically switch
> > between the 2 values below depending on the presence or not of non
> > sched idle tasks and always use sysctl_sched_idle_min_granularity
>
> My reasoning here is that sched_slice is something we reasonably
> expect to change as tasks enqueue/dequeue, and unlike place_entity()
> it does not create fairness issues by messing with vruntime.
> Additionally, it would be preferable to use the larger min granularity
> on a cpu running only idle tasks.

Fair enough

Reviewed-by: Vincent Guittot <[email protected]>

2021-08-24 08:18:06

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] sched: adjust sleeper credit for SCHED_IDLE entities

Hi,

On Fri, 20 Aug 2021 at 09:06, Josh Don <[email protected]> wrote:
>
> Give reduced sleeper credit to SCHED_IDLE entities. As a result, woken
> SCHED_IDLE entities will take longer to preempt normal entities.
>
> The benefit of this change is to make it less likely that a newly woken
> SCHED_IDLE entity will preempt a short-running normal entity before it
> blocks.
>
> We still give a small sleeper credit to SCHED_IDLE entities, so that
> idle<->idle competition retains some fairness.
>
> Example: With HZ=1000, spawned four threads affined to one cpu, one of
> which was set to SCHED_IDLE. Without this patch, wakeup latency for the
> SCHED_IDLE thread was ~1-2ms, with the patch the wakeup latency was
> ~5ms.
>
> Signed-off-by: Josh Don <[email protected]>
Tried to push a similar patch before, but failed. :)
https://lkml.org/lkml/2020/8/20/1773
Please pick my Reviewed-by if you don't mind,
Reviewed-by: Jiang Biao <[email protected]>

> ---
> kernel/sched/fair.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 31f40aa005b9..aa9c046d2aab 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4230,7 +4230,12 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>
> /* sleeps up to a single latency don't count. */
> if (!initial) {
> - unsigned long thresh = sysctl_sched_latency;
> + unsigned long thresh;
> +
> + if (se_is_idle(se))
> + thresh = sysctl_sched_min_granularity;
> + else
> + thresh = sysctl_sched_latency;
>
> /*
> * Halve their sleep time's effect, to allow
> --
> 2.33.0.rc2.250.ged5fa647cd-goog
>

2021-08-24 10:29:47

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

Hi,

On Fri, 20 Aug 2021 at 09:08, Josh Don <[email protected]> wrote:
>
> Use a small, non-scaled min granularity for SCHED_IDLE entities, when
> competing with normal entities. This reduces the latency of getting
> a normal entity back on cpu, at the expense of increased context
> switch frequency of SCHED_IDLE entities.
>
> The benefit of this change is to reduce the round-robin latency for
> normal entities when competing with a SCHED_IDLE entity.
Why not just ignore min granularity when normal entities compete with
a SCHED_IDLE entity? something like this,

@@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq,
struct sched_entity *se)
slice = __calc_delta(slice, se->load.weight, load);
}

- if (sched_feat(BASE_SLICE))
- slice = max(slice, (u64)sysctl_sched_min_granularity);
+ if (sched_feat(BASE_SLICE)
+ && (!se_is_idle(init_se) || sched_idle_cfs_rq(cfs_rq)))
+ slice = max(slice, (u64)sysctl_sched_min_granularity);

return slice;
}
If so, there seems no need to introduce sysctl_sched_idle_min_granularity? :)

>
> Example: on a machine with HZ=1000, spawned two threads, one of which is
> SCHED_IDLE, and affined to one cpu. Without this patch, the SCHED_IDLE
> thread runs for 4ms then waits for 1.4s. With this patch, it runs for
> 1ms and waits 340ms (as it round-robins with the other thread).
In that way, the SCHED_IDLE task could be preempted more likely by the
normal task, because the ideal_runtime should be less than
750us(non-scaled sysctl_sched_idle_min_granularity) in this case. And
scaled sysctl_sched_min_granularity could be guaranteed normally
between SCHED_IDLE tasks when only SCHED_IDLE tasks compete with each
other.

>
> Signed-off-by: Josh Don <[email protected]>
> ---
> kernel/sched/debug.c | 2 ++
> kernel/sched/fair.c | 29 ++++++++++++++++++++++++-----
> kernel/sched/sched.h | 1 +
> 3 files changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 33538579db9a..317ef560aa63 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -305,6 +305,7 @@ static __init int sched_init_debug(void)
>
> debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
> debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
> + debugfs_create_u32("idle_min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_idle_min_granularity);
> debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);
>
> debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
> @@ -806,6 +807,7 @@ static void sched_debug_header(struct seq_file *m)
> SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
> PN(sysctl_sched_latency);
> PN(sysctl_sched_min_granularity);
> + PN(sysctl_sched_idle_min_granularity);
> PN(sysctl_sched_wakeup_granularity);
> P(sysctl_sched_child_runs_first);
> P(sysctl_sched_features);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 19a9244c140f..31f40aa005b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -59,6 +59,14 @@ unsigned int sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_LOG;
> unsigned int sysctl_sched_min_granularity = 750000ULL;
> static unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;
>
> +/*
> + * Minimal preemption granularity for CPU-bound SCHED_IDLE tasks.
> + * Applies only when SCHED_IDLE tasks compete with normal tasks.
> + *
> + * (default: 0.75 msec)
> + */
> +unsigned int sysctl_sched_idle_min_granularity = 750000ULL;
> +
> /*
> * This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
> */
> @@ -665,6 +673,8 @@ static u64 __sched_period(unsigned long nr_running)
> return sysctl_sched_latency;
> }
>
> +static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
> +
> /*
> * We calculate the wall-time slice from the period by taking a part
> * proportional to the weight.
> @@ -674,6 +684,8 @@ static u64 __sched_period(unsigned long nr_running)
> static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> unsigned int nr_running = cfs_rq->nr_running;
> + struct sched_entity *init_se = se;
> + unsigned int min_gran;
> u64 slice;
>
> if (sched_feat(ALT_PERIOD))
> @@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> for_each_sched_entity(se) {
> struct load_weight *load;
> struct load_weight lw;
> + struct cfs_rq *qcfs_rq;
>
> - cfs_rq = cfs_rq_of(se);
> - load = &cfs_rq->load;
> + qcfs_rq = cfs_rq_of(se);
> + load = &qcfs_rq->load;
>
> if (unlikely(!se->on_rq)) {
> - lw = cfs_rq->load;
> + lw = qcfs_rq->load;
>
> update_load_add(&lw, se->load.weight);
> load = &lw;
> @@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> slice = __calc_delta(slice, se->load.weight, load);
> }
>
> - if (sched_feat(BASE_SLICE))
> - slice = max(slice, (u64)sysctl_sched_min_granularity);
> + if (sched_feat(BASE_SLICE)) {
> + if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
> + min_gran = sysctl_sched_idle_min_granularity;
> + else
> + min_gran = sysctl_sched_min_granularity;
> +
> + slice = max_t(u64, slice, min_gran);
> + }
>
> return slice;
> }
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 6af039e433fb..29846da35861 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2399,6 +2399,7 @@ extern const_debug unsigned int sysctl_sched_migration_cost;
> #ifdef CONFIG_SCHED_DEBUG
> extern unsigned int sysctl_sched_latency;
> extern unsigned int sysctl_sched_min_granularity;
> +extern unsigned int sysctl_sched_idle_min_granularity;
> extern unsigned int sysctl_sched_wakeup_granularity;
> extern int sysctl_resched_latency_warn_ms;
> extern int sysctl_resched_latency_warn_once;
> --
> 2.33.0.rc2.250.ged5fa647cd-goog
>

2021-08-24 17:50:14

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

Hi Jiang,

On Tue, Aug 24, 2021 at 3:25 AM Jiang Biao <[email protected]> wrote:
>
> Why not just ignore min granularity when normal entities compete with
> a SCHED_IDLE entity? something like this,
>
> @@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq,
> struct sched_entity *se)
> slice = __calc_delta(slice, se->load.weight, load);
> }
>
> - if (sched_feat(BASE_SLICE))
> - slice = max(slice, (u64)sysctl_sched_min_granularity);
> + if (sched_feat(BASE_SLICE)
> + && (!se_is_idle(init_se) || sched_idle_cfs_rq(cfs_rq)))
> + slice = max(slice, (u64)sysctl_sched_min_granularity);
>
> return slice;
> }
> If so, there seems no need to introduce sysctl_sched_idle_min_granularity? :)

Ignoring min_gran entirely could lead to some really tiny slices; see
discussion at https://lkml.org/lkml/2021/8/12/651.

2021-08-24 17:52:00

by Josh Don

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] sched: adjust sleeper credit for SCHED_IDLE entities

On Tue, Aug 24, 2021 at 1:16 AM Jiang Biao <[email protected]> wrote:
>
> Hi,
>
> On Fri, 20 Aug 2021 at 09:06, Josh Don <[email protected]> wrote:
> >
> > Give reduced sleeper credit to SCHED_IDLE entities. As a result, woken
> > SCHED_IDLE entities will take longer to preempt normal entities.
> >
> > The benefit of this change is to make it less likely that a newly woken
> > SCHED_IDLE entity will preempt a short-running normal entity before it
> > blocks.
> >
> > We still give a small sleeper credit to SCHED_IDLE entities, so that
> > idle<->idle competition retains some fairness.
> >
> > Example: With HZ=1000, spawned four threads affined to one cpu, one of
> > which was set to SCHED_IDLE. Without this patch, wakeup latency for the
> > SCHED_IDLE thread was ~1-2ms, with the patch the wakeup latency was
> > ~5ms.
> >
> > Signed-off-by: Josh Don <[email protected]>
> Tried to push a similar patch before, but failed. :)
> https://lkml.org/lkml/2020/8/20/1773
> Please pick my Reviewed-by if you don't mind,
> Reviewed-by: Jiang Biao <[email protected]>

Done, thanks :)

2021-08-25 02:49:09

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] sched: reduce sched slice for SCHED_IDLE entities

On Wed, 25 Aug 2021 at 01:04, Josh Don <[email protected]> wrote:
>
> Hi Jiang,
>
> On Tue, Aug 24, 2021 at 3:25 AM Jiang Biao <[email protected]> wrote:
> >
> > Why not just ignore min granularity when normal entities compete with
> > a SCHED_IDLE entity? something like this,
> >
> > @@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq,
> > struct sched_entity *se)
> > slice = __calc_delta(slice, se->load.weight, load);
> > }
> >
> > - if (sched_feat(BASE_SLICE))
> > - slice = max(slice, (u64)sysctl_sched_min_granularity);
> > + if (sched_feat(BASE_SLICE)
> > + && (!se_is_idle(init_se) || sched_idle_cfs_rq(cfs_rq)))
> > + slice = max(slice, (u64)sysctl_sched_min_granularity);
> >
> > return slice;
> > }
> > If so, there seems no need to introduce sysctl_sched_idle_min_granularity? :)
>
> Ignoring min_gran entirely could lead to some really tiny slices; see
> discussion at https://lkml.org/lkml/2021/8/12/651.
Got it, tiny slices could be a problem in SCHED_HRTICK case.
But the sysctl_sched_idle_min_granularity used in sched_slice() and
sysctl_sched_min_granularity used in check_preempt_tick would have
different semantics for SCHED_IDLE task, which could be functional ok
but a little confusing.

Regards,
Jiang

Subject: [tip: sched/core] sched: adjust sleeper credit for SCHED_IDLE entities

The following commit has been merged into the sched/core branch of tip:

Commit-ID: bb1fc3bc521782d018902143c8301ab4a5e53557
Gitweb: https://git.kernel.org/tip/bb1fc3bc521782d018902143c8301ab4a5e53557
Author: Josh Don <[email protected]>
AuthorDate: Thu, 19 Aug 2021 18:04:03 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 09 Sep 2021 11:27:31 +02:00

sched: adjust sleeper credit for SCHED_IDLE entities

Give reduced sleeper credit to SCHED_IDLE entities. As a result, woken
SCHED_IDLE entities will take longer to preempt normal entities.

The benefit of this change is to make it less likely that a newly woken
SCHED_IDLE entity will preempt a short-running normal entity before it
blocks.

We still give a small sleeper credit to SCHED_IDLE entities, so that
idle<->idle competition retains some fairness.

Example: With HZ=1000, spawned four threads affined to one cpu, one of
which was set to SCHED_IDLE. Without this patch, wakeup latency for the
SCHED_IDLE thread was ~1-2ms, with the patch the wakeup latency was
~5ms.

Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Reviewed-by: Jiang Biao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7330a77..b27ed8b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4201,7 +4201,12 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)

/* sleeps up to a single latency don't count. */
if (!initial) {
- unsigned long thresh = sysctl_sched_latency;
+ unsigned long thresh;
+
+ if (se_is_idle(se))
+ thresh = sysctl_sched_min_granularity;
+ else
+ thresh = sysctl_sched_latency;

/*
* Halve their sleep time's effect, to allow

Subject: [tip: sched/core] sched: reduce sched slice for SCHED_IDLE entities

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 7e2ce158699bb7b6a489c7c1d89c0dde2d4ceef5
Gitweb: https://git.kernel.org/tip/7e2ce158699bb7b6a489c7c1d89c0dde2d4ceef5
Author: Josh Don <[email protected]>
AuthorDate: Thu, 19 Aug 2021 18:04:02 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 09 Sep 2021 11:27:31 +02:00

sched: reduce sched slice for SCHED_IDLE entities

Use a small, non-scaled min granularity for SCHED_IDLE entities, when
competing with normal entities. This reduces the latency of getting
a normal entity back on cpu, at the expense of increased context
switch frequency of SCHED_IDLE entities.

The benefit of this change is to reduce the round-robin latency for
normal entities when competing with a SCHED_IDLE entity.

Example: on a machine with HZ=1000, spawned two threads, one of which is
SCHED_IDLE, and affined to one cpu. Without this patch, the SCHED_IDLE
thread runs for 4ms then waits for 1.4s. With this patch, it runs for
1ms and waits 340ms (as it round-robins with the other thread).

Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/debug.c | 2 ++
kernel/sched/fair.c | 29 ++++++++++++++++++++++++-----
kernel/sched/sched.h | 1 +
3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 3353857..317ef56 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -305,6 +305,7 @@ static __init int sched_init_debug(void)

debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
+ debugfs_create_u32("idle_min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_idle_min_granularity);
debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);

debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
@@ -806,6 +807,7 @@ static void sched_debug_header(struct seq_file *m)
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
PN(sysctl_sched_min_granularity);
+ PN(sysctl_sched_idle_min_granularity);
PN(sysctl_sched_wakeup_granularity);
P(sysctl_sched_child_runs_first);
P(sysctl_sched_features);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7c0b9d..7330a77 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -60,6 +60,14 @@ unsigned int sysctl_sched_min_granularity = 750000ULL;
static unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;

/*
+ * Minimal preemption granularity for CPU-bound SCHED_IDLE tasks.
+ * Applies only when SCHED_IDLE tasks compete with normal tasks.
+ *
+ * (default: 0.75 msec)
+ */
+unsigned int sysctl_sched_idle_min_granularity = 750000ULL;
+
+/*
* This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
*/
static unsigned int sched_nr_latency = 8;
@@ -665,6 +673,8 @@ static u64 __sched_period(unsigned long nr_running)
return sysctl_sched_latency;
}

+static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
+
/*
* We calculate the wall-time slice from the period by taking a part
* proportional to the weight.
@@ -674,6 +684,8 @@ static u64 __sched_period(unsigned long nr_running)
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
unsigned int nr_running = cfs_rq->nr_running;
+ struct sched_entity *init_se = se;
+ unsigned int min_gran;
u64 slice;

if (sched_feat(ALT_PERIOD))
@@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
for_each_sched_entity(se) {
struct load_weight *load;
struct load_weight lw;
+ struct cfs_rq *qcfs_rq;

- cfs_rq = cfs_rq_of(se);
- load = &cfs_rq->load;
+ qcfs_rq = cfs_rq_of(se);
+ load = &qcfs_rq->load;

if (unlikely(!se->on_rq)) {
- lw = cfs_rq->load;
+ lw = qcfs_rq->load;

update_load_add(&lw, se->load.weight);
load = &lw;
@@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
slice = __calc_delta(slice, se->load.weight, load);
}

- if (sched_feat(BASE_SLICE))
- slice = max(slice, (u64)sysctl_sched_min_granularity);
+ if (sched_feat(BASE_SLICE)) {
+ if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
+ min_gran = sysctl_sched_idle_min_granularity;
+ else
+ min_gran = sysctl_sched_min_granularity;
+
+ slice = max_t(u64, slice, min_gran);
+ }

return slice;
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 413298d..6b2d8b7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2400,6 +2400,7 @@ extern const_debug unsigned int sysctl_sched_migration_cost;
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_latency;
extern unsigned int sysctl_sched_min_granularity;
+extern unsigned int sysctl_sched_idle_min_granularity;
extern unsigned int sysctl_sched_wakeup_granularity;
extern int sysctl_resched_latency_warn_ms;
extern int sysctl_resched_latency_warn_once;

Subject: [tip: sched/core] sched: reduce sched slice for SCHED_IDLE entities

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 51ce83ed523b00d58f2937ec014b12daaad55185
Gitweb: https://git.kernel.org/tip/51ce83ed523b00d58f2937ec014b12daaad55185
Author: Josh Don <[email protected]>
AuthorDate: Thu, 19 Aug 2021 18:04:02 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 05 Oct 2021 15:51:37 +02:00

sched: reduce sched slice for SCHED_IDLE entities

Use a small, non-scaled min granularity for SCHED_IDLE entities, when
competing with normal entities. This reduces the latency of getting
a normal entity back on cpu, at the expense of increased context
switch frequency of SCHED_IDLE entities.

The benefit of this change is to reduce the round-robin latency for
normal entities when competing with a SCHED_IDLE entity.

Example: on a machine with HZ=1000, spawned two threads, one of which is
SCHED_IDLE, and affined to one cpu. Without this patch, the SCHED_IDLE
thread runs for 4ms then waits for 1.4s. With this patch, it runs for
1ms and waits 340ms (as it round-robins with the other thread).

Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/debug.c | 2 ++
kernel/sched/fair.c | 29 ++++++++++++++++++++++++-----
kernel/sched/sched.h | 1 +
3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2e5fdd9..34913a7 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -311,6 +311,7 @@ static __init int sched_init_debug(void)

debugfs_create_u32("latency_ns", 0644, debugfs_sched, &sysctl_sched_latency);
debugfs_create_u32("min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_min_granularity);
+ debugfs_create_u32("idle_min_granularity_ns", 0644, debugfs_sched, &sysctl_sched_idle_min_granularity);
debugfs_create_u32("wakeup_granularity_ns", 0644, debugfs_sched, &sysctl_sched_wakeup_granularity);

debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
@@ -812,6 +813,7 @@ static void sched_debug_header(struct seq_file *m)
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
PN(sysctl_sched_min_granularity);
+ PN(sysctl_sched_idle_min_granularity);
PN(sysctl_sched_wakeup_granularity);
P(sysctl_sched_child_runs_first);
P(sysctl_sched_features);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9c78c16..d835061 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -60,6 +60,14 @@ unsigned int sysctl_sched_min_granularity = 750000ULL;
static unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;

/*
+ * Minimal preemption granularity for CPU-bound SCHED_IDLE tasks.
+ * Applies only when SCHED_IDLE tasks compete with normal tasks.
+ *
+ * (default: 0.75 msec)
+ */
+unsigned int sysctl_sched_idle_min_granularity = 750000ULL;
+
+/*
* This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
*/
static unsigned int sched_nr_latency = 8;
@@ -665,6 +673,8 @@ static u64 __sched_period(unsigned long nr_running)
return sysctl_sched_latency;
}

+static bool sched_idle_cfs_rq(struct cfs_rq *cfs_rq);
+
/*
* We calculate the wall-time slice from the period by taking a part
* proportional to the weight.
@@ -674,6 +684,8 @@ static u64 __sched_period(unsigned long nr_running)
static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
unsigned int nr_running = cfs_rq->nr_running;
+ struct sched_entity *init_se = se;
+ unsigned int min_gran;
u64 slice;

if (sched_feat(ALT_PERIOD))
@@ -684,12 +696,13 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
for_each_sched_entity(se) {
struct load_weight *load;
struct load_weight lw;
+ struct cfs_rq *qcfs_rq;

- cfs_rq = cfs_rq_of(se);
- load = &cfs_rq->load;
+ qcfs_rq = cfs_rq_of(se);
+ load = &qcfs_rq->load;

if (unlikely(!se->on_rq)) {
- lw = cfs_rq->load;
+ lw = qcfs_rq->load;

update_load_add(&lw, se->load.weight);
load = &lw;
@@ -697,8 +710,14 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
slice = __calc_delta(slice, se->load.weight, load);
}

- if (sched_feat(BASE_SLICE))
- slice = max(slice, (u64)sysctl_sched_min_granularity);
+ if (sched_feat(BASE_SLICE)) {
+ if (se_is_idle(init_se) && !sched_idle_cfs_rq(cfs_rq))
+ min_gran = sysctl_sched_idle_min_granularity;
+ else
+ min_gran = sysctl_sched_min_granularity;
+
+ slice = max_t(u64, slice, min_gran);
+ }

return slice;
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f2965b5..15a8895 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2403,6 +2403,7 @@ extern const_debug unsigned int sysctl_sched_migration_cost;
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_latency;
extern unsigned int sysctl_sched_min_granularity;
+extern unsigned int sysctl_sched_idle_min_granularity;
extern unsigned int sysctl_sched_wakeup_granularity;
extern int sysctl_resched_latency_warn_ms;
extern int sysctl_resched_latency_warn_once;

Subject: [tip: sched/core] sched: adjust sleeper credit for SCHED_IDLE entities

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 2cae3948edd488ebdef4deaf1d1043f92f47e665
Gitweb: https://git.kernel.org/tip/2cae3948edd488ebdef4deaf1d1043f92f47e665
Author: Josh Don <[email protected]>
AuthorDate: Thu, 19 Aug 2021 18:04:03 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 05 Oct 2021 15:51:39 +02:00

sched: adjust sleeper credit for SCHED_IDLE entities

Give reduced sleeper credit to SCHED_IDLE entities. As a result, woken
SCHED_IDLE entities will take longer to preempt normal entities.

The benefit of this change is to make it less likely that a newly woken
SCHED_IDLE entity will preempt a short-running normal entity before it
blocks.

We still give a small sleeper credit to SCHED_IDLE entities, so that
idle<->idle competition retains some fairness.

Example: With HZ=1000, spawned four threads affined to one cpu, one of
which was set to SCHED_IDLE. Without this patch, wakeup latency for the
SCHED_IDLE thread was ~1-2ms, with the patch the wakeup latency was
~5ms.

Signed-off-by: Josh Don <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Reviewed-by: Jiang Biao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d835061..5457c80 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4230,7 +4230,12 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)

/* sleeps up to a single latency don't count. */
if (!initial) {
- unsigned long thresh = sysctl_sched_latency;
+ unsigned long thresh;
+
+ if (se_is_idle(se))
+ thresh = sysctl_sched_min_granularity;
+ else
+ thresh = sysctl_sched_latency;

/*
* Halve their sleep time's effect, to allow