2023-07-07 20:13:03

by Phil Auld

[permalink] [raw]
Subject: [PATCH v5 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

CFS bandwidth limits and NOHZ full don't play well together. Tasks
can easily run well past their quotas before a remote tick does
accounting. This leads to long, multi-period stalls before such
tasks can run again. Currently, when presented with these conflicting
requirements the scheduler is favoring nohz_full and letting the tick
be stopped. However, nohz tick stopping is already best-effort, there
are a number of conditions that can prevent it, whereas cfs runtime
bandwidth is expected to be enforced.

Make the scheduler favor bandwidth over stopping the tick by setting
TICK_DEP_BIT_SCHED when the only running task is a cfs task with
runtime limit enabled. We use cfs_b->hierarchical_quota to
determine if the task requires the tick.

Add check in pick_next_task_fair() as well since that is where
we have a handle on the task that is actually going to be running.

Add check in sched_can_stop_tick() to cover some edge cases such
as nr_running going from 2->1 and the 1 remains the running task.

Add sched_feat HZ_BW (off by default) to control the tick_stop
behavior.

Signed-off-by: Phil Auld <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
---
kernel/sched/core.c | 12 ++++++++++
kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++++
kernel/sched/features.h | 2 ++
kernel/sched/sched.h | 1 +
4 files changed, 64 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1b214e10c25d..4b8534abdf4f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1229,6 +1229,18 @@ bool sched_can_stop_tick(struct rq *rq)
if (rq->nr_running > 1)
return false;

+ /*
+ * If there is one task and it has CFS runtime bandwidth constraints
+ * and it's on the cpu now we don't want to stop the tick.
+ * This check prevents clearing the bit if a newly enqueued task here is
+ * dequeued by migrating while the constrained task continues to run.
+ * E.g. going from 2->1 without going through pick_next_task().
+ */
+ if (sched_feat(HZ_BW) && rq->nr_running == 1 && task_on_rq_queued(rq->curr)) {
+ if (cfs_task_bw_constrained(rq->curr))
+ return false;
+ }
+
return true;
}
#endif /* CONFIG_NO_HZ_FULL */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 92381f9ecf37..70429abd8df5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6140,6 +6140,46 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
rcu_read_unlock();
}

+bool cfs_task_bw_constrained(struct task_struct *p)
+{
+ struct cfs_rq *cfs_rq = task_cfs_rq(p);
+
+ if (!cfs_bandwidth_used())
+ return false;
+
+ if (cfs_rq->runtime_enabled ||
+ tg_cfs_bandwidth(cfs_rq->tg)->hierarchical_quota != RUNTIME_INF)
+ return true;
+
+ return false;
+}
+
+#ifdef CONFIG_NO_HZ_FULL
+/* called from pick_next_task_fair() */
+static void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p)
+{
+ int cpu = cpu_of(rq);
+
+ if (!sched_feat(HZ_BW) || !cfs_bandwidth_used())
+ return;
+
+ if (!tick_nohz_full_cpu(cpu))
+ return;
+
+ if (rq->nr_running != 1)
+ return;
+
+ /*
+ * We know there is only one task runnable and we've just picked it. The
+ * normal enqueue path will have cleared TICK_DEP_BIT_SCHED if we will
+ * be otherwise able to stop the tick. Just need to check if we are using
+ * bandwidth control.
+ */
+ if (cfs_task_bw_constrained(p))
+ tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
+}
+#endif
+
#else /* CONFIG_CFS_BANDWIDTH */

static inline bool cfs_bandwidth_used(void)
@@ -6182,9 +6222,17 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
static inline void update_runtime_enabled(struct rq *rq) {}
static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
+bool cfs_task_bw_constrained(struct task_struct *p)
+{
+ return false;
+}

#endif /* CONFIG_CFS_BANDWIDTH */

+#if !defined(CONFIG_CFS_BANDWIDTH) || !defined(CONFIG_NO_HZ_FULL)
+static inline void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p) {}
+#endif
+
/**************************************************
* CFS operations on tasks:
*/
@@ -8098,6 +8146,7 @@ done: __maybe_unused;
hrtick_start_fair(rq, p);

update_misfit_status(p, rq);
+ sched_fair_update_stop_tick(rq, p);

return p;

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..6fdf1fdf6b17 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -101,3 +101,5 @@ SCHED_FEAT(LATENCY_WARN, false)

SCHED_FEAT(ALT_PERIOD, true)
SCHED_FEAT(BASE_SLICE, true)
+
+SCHED_FEAT(HZ_BW, false)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 63822c9238cc..d6d346bc78aa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -465,6 +465,7 @@ extern void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth
extern void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b);
extern void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b);
extern void unthrottle_cfs_rq(struct cfs_rq *cfs_rq);
+extern bool cfs_task_bw_constrained(struct task_struct *p);

extern void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq,
struct sched_rt_entity *rt_se, int cpu,
--
2.31.1



2023-07-11 00:44:51

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

Phil Auld <[email protected]> writes:

> CFS bandwidth limits and NOHZ full don't play well together. Tasks
> can easily run well past their quotas before a remote tick does
> accounting. This leads to long, multi-period stalls before such
> tasks can run again. Currently, when presented with these conflicting
> requirements the scheduler is favoring nohz_full and letting the tick
> be stopped. However, nohz tick stopping is already best-effort, there
> are a number of conditions that can prevent it, whereas cfs runtime
> bandwidth is expected to be enforced.
>
> Make the scheduler favor bandwidth over stopping the tick by setting
> TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> runtime limit enabled. We use cfs_b->hierarchical_quota to
> determine if the task requires the tick.
>
> Add check in pick_next_task_fair() as well since that is where
> we have a handle on the task that is actually going to be running.
>
> Add check in sched_can_stop_tick() to cover some edge cases such
> as nr_running going from 2->1 and the 1 remains the running task.
>
> Add sched_feat HZ_BW (off by default) to control the tick_stop
> behavior.
>
> Signed-off-by: Phil Auld <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Ben Segall <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> ---
> kernel/sched/core.c | 12 ++++++++++
> kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++++
> kernel/sched/features.h | 2 ++
> kernel/sched/sched.h | 1 +
> 4 files changed, 64 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1b214e10c25d..4b8534abdf4f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1229,6 +1229,18 @@ bool sched_can_stop_tick(struct rq *rq)
> if (rq->nr_running > 1)
> return false;
>
> + /*
> + * If there is one task and it has CFS runtime bandwidth constraints
> + * and it's on the cpu now we don't want to stop the tick.
> + * This check prevents clearing the bit if a newly enqueued task here is
> + * dequeued by migrating while the constrained task continues to run.
> + * E.g. going from 2->1 without going through pick_next_task().
> + */
> + if (sched_feat(HZ_BW) && rq->nr_running == 1 && task_on_rq_queued(rq->curr)) {
> + if (cfs_task_bw_constrained(rq->curr))
> + return false;
> + }
> +

I think we still need the fair_sched_class check with the bit being on
cfs_rq/tg rather than task.


2023-07-11 14:16:25

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

On Mon, Jul 10, 2023 at 04:54:58PM -0700 Benjamin Segall wrote:
> Phil Auld <[email protected]> writes:
>
> > CFS bandwidth limits and NOHZ full don't play well together. Tasks
> > can easily run well past their quotas before a remote tick does
> > accounting. This leads to long, multi-period stalls before such
> > tasks can run again. Currently, when presented with these conflicting
> > requirements the scheduler is favoring nohz_full and letting the tick
> > be stopped. However, nohz tick stopping is already best-effort, there
> > are a number of conditions that can prevent it, whereas cfs runtime
> > bandwidth is expected to be enforced.
> >
> > Make the scheduler favor bandwidth over stopping the tick by setting
> > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> > runtime limit enabled. We use cfs_b->hierarchical_quota to
> > determine if the task requires the tick.
> >
> > Add check in pick_next_task_fair() as well since that is where
> > we have a handle on the task that is actually going to be running.
> >
> > Add check in sched_can_stop_tick() to cover some edge cases such
> > as nr_running going from 2->1 and the 1 remains the running task.
> >
> > Add sched_feat HZ_BW (off by default) to control the tick_stop
> > behavior.
> >
> > Signed-off-by: Phil Auld <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Juri Lelli <[email protected]>
> > Cc: Dietmar Eggemann <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Ben Segall <[email protected]>
> > Cc: Frederic Weisbecker <[email protected]>
> > ---
> > kernel/sched/core.c | 12 ++++++++++
> > kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++++
> > kernel/sched/features.h | 2 ++
> > kernel/sched/sched.h | 1 +
> > 4 files changed, 64 insertions(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 1b214e10c25d..4b8534abdf4f 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1229,6 +1229,18 @@ bool sched_can_stop_tick(struct rq *rq)
> > if (rq->nr_running > 1)
> > return false;
> >
> > + /*
> > + * If there is one task and it has CFS runtime bandwidth constraints
> > + * and it's on the cpu now we don't want to stop the tick.
> > + * This check prevents clearing the bit if a newly enqueued task here is
> > + * dequeued by migrating while the constrained task continues to run.
> > + * E.g. going from 2->1 without going through pick_next_task().
> > + */
> > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && task_on_rq_queued(rq->curr)) {
> > + if (cfs_task_bw_constrained(rq->curr))
> > + return false;
> > + }
> > +
>
> I think we still need the fair_sched_class check with the bit being on
> cfs_rq/tg rather than task.
>

Is there a way a non-fair_sched_class task will be in a cfs_rq with
cfs_rq->runtime_enabled and/or cfs_b->hierarchical_quota set to non
RUNTIME_INF? I suppose if they are stale and it's had its class changed?

That makes the condition pretty ugly but I can add that back if needed.


Thanks,
Phil



--


2023-07-11 14:41:12

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

On Tue, Jul 11, 2023 at 09:10:24AM -0400 Phil Auld wrote:
> On Mon, Jul 10, 2023 at 04:54:58PM -0700 Benjamin Segall wrote:
> > Phil Auld <[email protected]> writes:
> >
> > > CFS bandwidth limits and NOHZ full don't play well together. Tasks
> > > can easily run well past their quotas before a remote tick does
> > > accounting. This leads to long, multi-period stalls before such
> > > tasks can run again. Currently, when presented with these conflicting
> > > requirements the scheduler is favoring nohz_full and letting the tick
> > > be stopped. However, nohz tick stopping is already best-effort, there
> > > are a number of conditions that can prevent it, whereas cfs runtime
> > > bandwidth is expected to be enforced.
> > >
> > > Make the scheduler favor bandwidth over stopping the tick by setting
> > > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> > > runtime limit enabled. We use cfs_b->hierarchical_quota to
> > > determine if the task requires the tick.
> > >
> > > Add check in pick_next_task_fair() as well since that is where
> > > we have a handle on the task that is actually going to be running.
> > >
> > > Add check in sched_can_stop_tick() to cover some edge cases such
> > > as nr_running going from 2->1 and the 1 remains the running task.
> > >
> > > Add sched_feat HZ_BW (off by default) to control the tick_stop
> > > behavior.
> > >
> > > Signed-off-by: Phil Auld <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Peter Zijlstra <[email protected]>
> > > Cc: Vincent Guittot <[email protected]>
> > > Cc: Juri Lelli <[email protected]>
> > > Cc: Dietmar Eggemann <[email protected]>
> > > Cc: Valentin Schneider <[email protected]>
> > > Cc: Ben Segall <[email protected]>
> > > Cc: Frederic Weisbecker <[email protected]>
> > > ---
> > > kernel/sched/core.c | 12 ++++++++++
> > > kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++++
> > > kernel/sched/features.h | 2 ++
> > > kernel/sched/sched.h | 1 +
> > > 4 files changed, 64 insertions(+)
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 1b214e10c25d..4b8534abdf4f 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1229,6 +1229,18 @@ bool sched_can_stop_tick(struct rq *rq)
> > > if (rq->nr_running > 1)
> > > return false;
> > >
> > > + /*
> > > + * If there is one task and it has CFS runtime bandwidth constraints
> > > + * and it's on the cpu now we don't want to stop the tick.
> > > + * This check prevents clearing the bit if a newly enqueued task here is
> > > + * dequeued by migrating while the constrained task continues to run.
> > > + * E.g. going from 2->1 without going through pick_next_task().
> > > + */
> > > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && task_on_rq_queued(rq->curr)) {
> > > + if (cfs_task_bw_constrained(rq->curr))
> > > + return false;
> > > + }
> > > +
> >
> > I think we still need the fair_sched_class check with the bit being on
> > cfs_rq/tg rather than task.
> >
>
> Is there a way a non-fair_sched_class task will be in a cfs_rq with
> cfs_rq->runtime_enabled and/or cfs_b->hierarchical_quota set to non
> RUNTIME_INF? I suppose if they are stale and it's had its class changed?
>
> That makes the condition pretty ugly but I can add that back if needed.
>

Sigh, yeah. I took that out when I had the bit in the task. I'll put it
back in...



Cheers,
Phil

>
> Thanks,
> Phil
>
>
>
> --
>

--


2023-07-11 22:25:34

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

Phil Auld <[email protected]> writes:

> On Tue, Jul 11, 2023 at 09:10:24AM -0400 Phil Auld wrote:
>> On Mon, Jul 10, 2023 at 04:54:58PM -0700 Benjamin Segall wrote:
>> > Phil Auld <[email protected]> writes:
>> >
>> > > CFS bandwidth limits and NOHZ full don't play well together. Tasks
>> > > can easily run well past their quotas before a remote tick does
>> > > accounting. This leads to long, multi-period stalls before such
>> > > tasks can run again. Currently, when presented with these conflicting
>> > > requirements the scheduler is favoring nohz_full and letting the tick
>> > > be stopped. However, nohz tick stopping is already best-effort, there
>> > > are a number of conditions that can prevent it, whereas cfs runtime
>> > > bandwidth is expected to be enforced.
>> > >
>> > > Make the scheduler favor bandwidth over stopping the tick by setting
>> > > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
>> > > runtime limit enabled. We use cfs_b->hierarchical_quota to
>> > > determine if the task requires the tick.
>> > >
>> > > Add check in pick_next_task_fair() as well since that is where
>> > > we have a handle on the task that is actually going to be running.
>> > >
>> > > Add check in sched_can_stop_tick() to cover some edge cases such
>> > > as nr_running going from 2->1 and the 1 remains the running task.
>> > >
>> > > Add sched_feat HZ_BW (off by default) to control the tick_stop
>> > > behavior.
>> > >
>> > > Signed-off-by: Phil Auld <[email protected]>
>> > > Cc: Ingo Molnar <[email protected]>
>> > > Cc: Peter Zijlstra <[email protected]>
>> > > Cc: Vincent Guittot <[email protected]>
>> > > Cc: Juri Lelli <[email protected]>
>> > > Cc: Dietmar Eggemann <[email protected]>
>> > > Cc: Valentin Schneider <[email protected]>
>> > > Cc: Ben Segall <[email protected]>
>> > > Cc: Frederic Weisbecker <[email protected]>
>> > > ---
>> > > kernel/sched/core.c | 12 ++++++++++
>> > > kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++++
>> > > kernel/sched/features.h | 2 ++
>> > > kernel/sched/sched.h | 1 +
>> > > 4 files changed, 64 insertions(+)
>> > >
>> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > > index 1b214e10c25d..4b8534abdf4f 100644
>> > > --- a/kernel/sched/core.c
>> > > +++ b/kernel/sched/core.c
>> > > @@ -1229,6 +1229,18 @@ bool sched_can_stop_tick(struct rq *rq)
>> > > if (rq->nr_running > 1)
>> > > return false;
>> > >
>> > > + /*
>> > > + * If there is one task and it has CFS runtime bandwidth constraints
>> > > + * and it's on the cpu now we don't want to stop the tick.
>> > > + * This check prevents clearing the bit if a newly enqueued task here is
>> > > + * dequeued by migrating while the constrained task continues to run.
>> > > + * E.g. going from 2->1 without going through pick_next_task().
>> > > + */
>> > > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && task_on_rq_queued(rq->curr)) {
>> > > + if (cfs_task_bw_constrained(rq->curr))
>> > > + return false;
>> > > + }
>> > > +
>> >
>> > I think we still need the fair_sched_class check with the bit being on
>> > cfs_rq/tg rather than task.
>> >
>>
>> Is there a way a non-fair_sched_class task will be in a cfs_rq with
>> cfs_rq->runtime_enabled and/or cfs_b->hierarchical_quota set to non
>> RUNTIME_INF? I suppose if they are stale and it's had its class changed?
>>
>> That makes the condition pretty ugly but I can add that back if needed.
>>
>
> Sigh, yeah. I took that out when I had the bit in the task. I'll put it
> back in...
>

Yeah, cfs_rq (and rt_rq) are set unconditionally, and a cgroup can have
a mix of fair and RT tasks (whether or not that's a good idea from a
sysadmin perspective).

2023-07-11 23:14:05

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

On Tue, Jul 11, 2023 at 03:07:12PM -0700 Benjamin Segall wrote:
> Phil Auld <[email protected]> writes:
>
> > On Tue, Jul 11, 2023 at 09:10:24AM -0400 Phil Auld wrote:
> >> On Mon, Jul 10, 2023 at 04:54:58PM -0700 Benjamin Segall wrote:
> >> > Phil Auld <[email protected]> writes:
> >> >
> >> > > CFS bandwidth limits and NOHZ full don't play well together. Tasks
> >> > > can easily run well past their quotas before a remote tick does
> >> > > accounting. This leads to long, multi-period stalls before such
> >> > > tasks can run again. Currently, when presented with these conflicting
> >> > > requirements the scheduler is favoring nohz_full and letting the tick
> >> > > be stopped. However, nohz tick stopping is already best-effort, there
> >> > > are a number of conditions that can prevent it, whereas cfs runtime
> >> > > bandwidth is expected to be enforced.
> >> > >
> >> > > Make the scheduler favor bandwidth over stopping the tick by setting
> >> > > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> >> > > runtime limit enabled. We use cfs_b->hierarchical_quota to
> >> > > determine if the task requires the tick.
> >> > >
> >> > > Add check in pick_next_task_fair() as well since that is where
> >> > > we have a handle on the task that is actually going to be running.
> >> > >
> >> > > Add check in sched_can_stop_tick() to cover some edge cases such
> >> > > as nr_running going from 2->1 and the 1 remains the running task.
> >> > >
> >> > > Add sched_feat HZ_BW (off by default) to control the tick_stop
> >> > > behavior.
> >> > >
> >> > > Signed-off-by: Phil Auld <[email protected]>
> >> > > Cc: Ingo Molnar <[email protected]>
> >> > > Cc: Peter Zijlstra <[email protected]>
> >> > > Cc: Vincent Guittot <[email protected]>
> >> > > Cc: Juri Lelli <[email protected]>
> >> > > Cc: Dietmar Eggemann <[email protected]>
> >> > > Cc: Valentin Schneider <[email protected]>
> >> > > Cc: Ben Segall <[email protected]>
> >> > > Cc: Frederic Weisbecker <[email protected]>
> >> > > ---
> >> > > kernel/sched/core.c | 12 ++++++++++
> >> > > kernel/sched/fair.c | 49 +++++++++++++++++++++++++++++++++++++++++
> >> > > kernel/sched/features.h | 2 ++
> >> > > kernel/sched/sched.h | 1 +
> >> > > 4 files changed, 64 insertions(+)
> >> > >
> >> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> > > index 1b214e10c25d..4b8534abdf4f 100644
> >> > > --- a/kernel/sched/core.c
> >> > > +++ b/kernel/sched/core.c
> >> > > @@ -1229,6 +1229,18 @@ bool sched_can_stop_tick(struct rq *rq)
> >> > > if (rq->nr_running > 1)
> >> > > return false;
> >> > >
> >> > > + /*
> >> > > + * If there is one task and it has CFS runtime bandwidth constraints
> >> > > + * and it's on the cpu now we don't want to stop the tick.
> >> > > + * This check prevents clearing the bit if a newly enqueued task here is
> >> > > + * dequeued by migrating while the constrained task continues to run.
> >> > > + * E.g. going from 2->1 without going through pick_next_task().
> >> > > + */
> >> > > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && task_on_rq_queued(rq->curr)) {
> >> > > + if (cfs_task_bw_constrained(rq->curr))
> >> > > + return false;
> >> > > + }
> >> > > +
> >> >
> >> > I think we still need the fair_sched_class check with the bit being on
> >> > cfs_rq/tg rather than task.
> >> >
> >>
> >> Is there a way a non-fair_sched_class task will be in a cfs_rq with
> >> cfs_rq->runtime_enabled and/or cfs_b->hierarchical_quota set to non
> >> RUNTIME_INF? I suppose if they are stale and it's had its class changed?
> >>
> >> That makes the condition pretty ugly but I can add that back if needed.
> >>
> >
> > Sigh, yeah. I took that out when I had the bit in the task. I'll put it
> > back in...
> >
>
> Yeah, cfs_rq (and rt_rq) are set unconditionally, and a cgroup can have
> a mix of fair and RT tasks (whether or not that's a good idea from a
> sysadmin perspective).
>

Thanks. I think I'll try the condition as a single-use static inline function.
The generated code seems the same but it is a bit nicer to read.


Cheers,
Phil


--