LinuxLists.cc - [PATCH v2] Sched/fair: Block nohz tick

2023-06-27 19:54:08

Subject: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

CFS bandwidth limits and NOHZ full don't play well together. Tasks
can easily run well past their quotas before a remote tick does
accounting. This leads to long, multi-period stalls before such
tasks can run again. Currentlyi, when presented with these conflicting
requirements the scheduler is favoring nohz_full and letting the tick
be stopped. However, nohz tick stopping is already best-effort, there
are a number of conditions that can prevent it, whereas cfs runtime
bandwidth is expected to be enforced.

Make the scheduler favor bandwidth over stopping the tick by setting
TICK_DEP_BIT_SCHED when the only running task is a cfs task with
runtime limit enabled.

Add sched_feat HZ_BW (off by default) to control this behavior.

Signed-off-by: Phil Auld <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Ben Segall <[email protected]>
---

v2: Ben pointed out that the bit could get cleared in the dequeue path
if we migrate a newly enqueued task without preempting curr. Added a
check for that edge case to sched_can_stop_tick. Removed the call to
sched_can_stop_tick from sched_fair_update_stop_tick since it was
redundant.

kernel/sched/core.c | 12 +++++++++++
kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++
kernel/sched/features.h | 2 ++
3 files changed, 59 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a68d1276bab0..646f60bfc7e7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1194,6 +1194,8 @@ static void nohz_csd_func(void *info)
#endif /* CONFIG_NO_HZ_COMMON */

#ifdef CONFIG_NO_HZ_FULL
+extern bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq);
+
bool sched_can_stop_tick(struct rq *rq)
{
int fifo_nr_running;
@@ -1229,6 +1231,16 @@ bool sched_can_stop_tick(struct rq *rq)
if (rq->nr_running > 1)
return false;

+ /*
+ * If there is one task and it has CFS runtime bandwidth constraints
+ * and it's on the cpu now we don't want to stop the tick.
+ */
+ if (sched_feat(HZ_BW) && rq->nr_running == 1 && rq->curr
+ && rq->curr->sched_class == &fair_sched_class && task_on_rq_queued(rq->curr)) {
+ if (sched_cfs_bandwidth_active(task_cfs_rq(rq->curr)))
+ return false;
+ }
+
return true;
}
#endif /* CONFIG_NO_HZ_FULL */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..a05af33b8da9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6139,6 +6139,42 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
rcu_read_unlock();
}

+#ifdef CONFIG_NO_HZ_FULL
+
+bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq)
+{
+ if (cfs_bandwidth_used() && cfs_rq->runtime_enabled)
+ return true;
+
+ return false;
+}
+
+/* called from pick_next_task_fair() */
+static void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p)
+{
+ struct cfs_rq *cfs_rq = task_cfs_rq(p);
+ int cpu = cpu_of(rq);
+
+ if (!sched_feat(HZ_BW) || !cfs_bandwidth_used())
+ return;
+
+ if (!tick_nohz_full_cpu(cpu))
+ return;
+
+ if (rq->nr_running != 1)
+ return;
+
+ /*
+ * We know there is only one task runnable and we've just picked it. The
+ * normal enqueue path will have cleared TICK_DEP_BIT_SCHED if we will
+ * be otherwise able to stop the tick. Just need to check if we are using
+ * bandwidth control.
+ */
+ if (cfs_rq->runtime_enabled)
+ tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
+}
+#endif
+
#else /* CONFIG_CFS_BANDWIDTH */

static inline bool cfs_bandwidth_used(void)
@@ -6181,9 +6217,17 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
static inline void update_runtime_enabled(struct rq *rq) {}
static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
+bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq)
+{
+ return false;
+}

#endif /* CONFIG_CFS_BANDWIDTH */

+#if !defined(CONFIG_CFS_BANDWIDTH) || !defined(CONFIG_NO_HZ_FULL)
+static inline void sched_fair_update_stop_tick(struct rq *rq, struct task_struct *p) {}
+#endif
+
/**************************************************
* CFS operations on tasks:
*/
@@ -8097,6 +8141,7 @@ done: __maybe_unused;
hrtick_start_fair(rq, p);

update_misfit_status(p, rq);
+ sched_fair_update_stop_tick(rq, p);

return p;

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..6fdf1fdf6b17 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -101,3 +101,5 @@ SCHED_FEAT(LATENCY_WARN, false)

SCHED_FEAT(ALT_PERIOD, true)
SCHED_FEAT(BASE_SLICE, true)
+
+SCHED_FEAT(HZ_BW, false)
--
2.31.1

2023-06-27 21:33:32

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

Hi Phil,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/auto-latest]
[also build test WARNING on linus/master v6.4 next-20230627]
[cannot apply to tip/sched/core tip/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Phil-Auld/Sched-fair-Block-nohz-tick_stop-when-cfs-bandwidth-in-use/20230628-031312
base: tip/auto-latest
patch link: https://lore.kernel.org/r/20230627191201.344110-1-pauld%40redhat.com
patch subject: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use
config: nios2-randconfig-r035-20230627 (https://download.01.org/0day-ci/archive/20230628/[email protected]/config)
compiler: nios2-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230628/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

kernel/sched/fair.c:688:5: warning: no previous prototype for 'sched_update_scaling' [-Wmissing-prototypes]
688 | int sched_update_scaling(void)
| ^~~~~~~~~~~~~~~~~~~~
>> kernel/sched/fair.c:6220:6: warning: no previous prototype for 'sched_cfs_bandwidth_active' [-Wmissing-prototypes]
6220 | bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/sched_cfs_bandwidth_active +6220 kernel/sched/fair.c

6212
6213 static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
6214 {
6215 return NULL;
6216 }
6217 static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
6218 static inline void update_runtime_enabled(struct rq *rq) {}
6219 static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
> 6220 bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq)
6221 {
6222 return false;
6223 }
6224

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-06-28 22:20:50

by Benjamin Segall

[permalink] [raw]

Subject: Re: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

Phil Auld <[email protected]> writes:

> CFS bandwidth limits and NOHZ full don't play well together. Tasks
> can easily run well past their quotas before a remote tick does
> accounting. This leads to long, multi-period stalls before such
> tasks can run again. Currentlyi, when presented with these conflicting
> requirements the scheduler is favoring nohz_full and letting the tick
> be stopped. However, nohz tick stopping is already best-effort, there
> are a number of conditions that can prevent it, whereas cfs runtime
> bandwidth is expected to be enforced.
>
> Make the scheduler favor bandwidth over stopping the tick by setting
> TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> runtime limit enabled.
>
> Add sched_feat HZ_BW (off by default) to control this behavior.
>
> Signed-off-by: Phil Auld <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Ben Segall <[email protected]>
> ---
>
> v2: Ben pointed out that the bit could get cleared in the dequeue path
> if we migrate a newly enqueued task without preempting curr. Added a
> check for that edge case to sched_can_stop_tick. Removed the call to
> sched_can_stop_tick from sched_fair_update_stop_tick since it was
> redundant.
>
> kernel/sched/core.c | 12 +++++++++++
> kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++
> kernel/sched/features.h | 2 ++
> 3 files changed, 59 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a68d1276bab0..646f60bfc7e7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1194,6 +1194,8 @@ static void nohz_csd_func(void *info)
> #endif /* CONFIG_NO_HZ_COMMON */
>
> #ifdef CONFIG_NO_HZ_FULL
> +extern bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq);
> +
> bool sched_can_stop_tick(struct rq *rq)
> {
> int fifo_nr_running;
> @@ -1229,6 +1231,16 @@ bool sched_can_stop_tick(struct rq *rq)
> if (rq->nr_running > 1)
> return false;
>
> + /*
> + * If there is one task and it has CFS runtime bandwidth constraints
> + * and it's on the cpu now we don't want to stop the tick.
> + */
> + if (sched_feat(HZ_BW) && rq->nr_running == 1 && rq->curr
> + && rq->curr->sched_class == &fair_sched_class && task_on_rq_queued(rq->curr)) {
> + if (sched_cfs_bandwidth_active(task_cfs_rq(rq->curr)))

Actually, something I should have noticed earlier is that this should
probably be hierarchical, right? You need to check every ancestor
cfs_rq, not just the immediate parent. And at that point it probably
makes sense to have sched_cfs_bandwidth_active take a task_struct.

2023-06-29 01:06:13

by Phil Auld

[permalink] [raw]

Subject: Re: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

On Wed, Jun 28, 2023 at 02:42:16PM -0700 Benjamin Segall wrote:
> Phil Auld <[email protected]> writes:
>
> > CFS bandwidth limits and NOHZ full don't play well together. Tasks
> > can easily run well past their quotas before a remote tick does
> > accounting. This leads to long, multi-period stalls before such
> > tasks can run again. Currentlyi, when presented with these conflicting
> > requirements the scheduler is favoring nohz_full and letting the tick
> > be stopped. However, nohz tick stopping is already best-effort, there
> > are a number of conditions that can prevent it, whereas cfs runtime
> > bandwidth is expected to be enforced.
> >
> > Make the scheduler favor bandwidth over stopping the tick by setting
> > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> > runtime limit enabled.
> >
> > Add sched_feat HZ_BW (off by default) to control this behavior.
> >
> > Signed-off-by: Phil Auld <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Juri Lelli <[email protected]>
> > Cc: Dietmar Eggemann <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Ben Segall <[email protected]>
> > ---
> >
> > v2: Ben pointed out that the bit could get cleared in the dequeue path
> > if we migrate a newly enqueued task without preempting curr. Added a
> > check for that edge case to sched_can_stop_tick. Removed the call to
> > sched_can_stop_tick from sched_fair_update_stop_tick since it was
> > redundant.
> >
> > kernel/sched/core.c | 12 +++++++++++
> > kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++
> > kernel/sched/features.h | 2 ++
> > 3 files changed, 59 insertions(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a68d1276bab0..646f60bfc7e7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1194,6 +1194,8 @@ static void nohz_csd_func(void *info)
> > #endif /* CONFIG_NO_HZ_COMMON */
> >
> > #ifdef CONFIG_NO_HZ_FULL
> > +extern bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq);
> > +
> > bool sched_can_stop_tick(struct rq *rq)
> > {
> > int fifo_nr_running;
> > @@ -1229,6 +1231,16 @@ bool sched_can_stop_tick(struct rq *rq)
> > if (rq->nr_running > 1)
> > return false;
> >
> > + /*
> > + * If there is one task and it has CFS runtime bandwidth constraints
> > + * and it's on the cpu now we don't want to stop the tick.
> > + */
> > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && rq->curr
> > + && rq->curr->sched_class == &fair_sched_class && task_on_rq_queued(rq->curr)) {
> > + if (sched_cfs_bandwidth_active(task_cfs_rq(rq->curr)))
>
> Actually, something I should have noticed earlier is that this should
> probably be hierarchical, right? You need to check every ancestor
> cfs_rq, not just the immediate parent. And at that point it probably
> makes sense to have sched_cfs_bandwidth_active take a task_struct.
>

Are you saying a child cfs_rq with a parent that has runtime_enabled could
itself not have runtime_enabled? I may be missing something but I don't
see how that works.

account_cfs_rq_runtime() for example just looks at the immediate cfs_rq of
curr and bails if it does not have runtime_enabled. How could that task get
throttled if it exceeds some parent's limit?

Confused :)

Cheers,
Phil

--

2023-06-29 18:07:13

by Benjamin Segall

[permalink] [raw]

Subject: Re: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

Phil Auld <[email protected]> writes:

> On Wed, Jun 28, 2023 at 02:42:16PM -0700 Benjamin Segall wrote:
>> Phil Auld <[email protected]> writes:
>>
>> > CFS bandwidth limits and NOHZ full don't play well together. Tasks
>> > can easily run well past their quotas before a remote tick does
>> > accounting. This leads to long, multi-period stalls before such
>> > tasks can run again. Currentlyi, when presented with these conflicting
>> > requirements the scheduler is favoring nohz_full and letting the tick
>> > be stopped. However, nohz tick stopping is already best-effort, there
>> > are a number of conditions that can prevent it, whereas cfs runtime
>> > bandwidth is expected to be enforced.
>> >
>> > Make the scheduler favor bandwidth over stopping the tick by setting
>> > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
>> > runtime limit enabled.
>> >
>> > Add sched_feat HZ_BW (off by default) to control this behavior.
>> >
>> > Signed-off-by: Phil Auld <[email protected]>
>> > Cc: Ingo Molnar <[email protected]>
>> > Cc: Peter Zijlstra <[email protected]>
>> > Cc: Vincent Guittot <[email protected]>
>> > Cc: Juri Lelli <[email protected]>
>> > Cc: Dietmar Eggemann <[email protected]>
>> > Cc: Valentin Schneider <[email protected]>
>> > Cc: Ben Segall <[email protected]>
>> > ---
>> >
>> > v2: Ben pointed out that the bit could get cleared in the dequeue path
>> > if we migrate a newly enqueued task without preempting curr. Added a
>> > check for that edge case to sched_can_stop_tick. Removed the call to
>> > sched_can_stop_tick from sched_fair_update_stop_tick since it was
>> > redundant.
>> >
>> > kernel/sched/core.c | 12 +++++++++++
>> > kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++
>> > kernel/sched/features.h | 2 ++
>> > 3 files changed, 59 insertions(+)
>> >
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index a68d1276bab0..646f60bfc7e7 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -1194,6 +1194,8 @@ static void nohz_csd_func(void *info)
>> > #endif /* CONFIG_NO_HZ_COMMON */
>> >
>> > #ifdef CONFIG_NO_HZ_FULL
>> > +extern bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq);
>> > +
>> > bool sched_can_stop_tick(struct rq *rq)
>> > {
>> > int fifo_nr_running;
>> > @@ -1229,6 +1231,16 @@ bool sched_can_stop_tick(struct rq *rq)
>> > if (rq->nr_running > 1)
>> > return false;
>> >
>> > + /*
>> > + * If there is one task and it has CFS runtime bandwidth constraints
>> > + * and it's on the cpu now we don't want to stop the tick.
>> > + */
>> > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && rq->curr
>> > + && rq->curr->sched_class == &fair_sched_class && task_on_rq_queued(rq->curr)) {
>> > + if (sched_cfs_bandwidth_active(task_cfs_rq(rq->curr)))
>>
>> Actually, something I should have noticed earlier is that this should
>> probably be hierarchical, right? You need to check every ancestor
>> cfs_rq, not just the immediate parent. And at that point it probably
>> makes sense to have sched_cfs_bandwidth_active take a task_struct.
>>
>
> Are you saying a child cfs_rq with a parent that has runtime_enabled could
> itself not have runtime_enabled? I may be missing something but I don't
> see how that works.

Correct.

>
> account_cfs_rq_runtime() for example just looks at the immediate cfs_rq of
> curr and bails if it does not have runtime_enabled. How could that task get
> throttled if it exceeds some parent's limit?

account_cfs_rq_runtime() is called (primarily) from update_curr(), which
is called by enqueue_entity/dequeue_entity/entity_tick/etc, which are
called at each level of the hierarchy.

The worse cache behavior of doing a separate walk in sched_can_stop_tick
aka add/sub_nr_running could I guess be avoided by having some
runtime_enabled flag on the task struct or rq that is up to date for
rq->curr only. That would only be a little annoying to keep accurate,
and there's the dual arguments of "task_struct/rq is already too
cluttered"/"well they're already so cluttered a little more won't hurt".

2023-06-29 19:56:43

by Phil Auld

[permalink] [raw]

Subject: Re: [PATCH v2] Sched/fair: Block nohz tick_stop when cfs bandwidth in use

On Thu, Jun 29, 2023 at 10:55:44AM -0700 Benjamin Segall wrote:
> Phil Auld <[email protected]> writes:
>
> > On Wed, Jun 28, 2023 at 02:42:16PM -0700 Benjamin Segall wrote:
> >> Phil Auld <[email protected]> writes:
> >>
> >> > CFS bandwidth limits and NOHZ full don't play well together. Tasks
> >> > can easily run well past their quotas before a remote tick does
> >> > accounting. This leads to long, multi-period stalls before such
> >> > tasks can run again. Currentlyi, when presented with these conflicting
> >> > requirements the scheduler is favoring nohz_full and letting the tick
> >> > be stopped. However, nohz tick stopping is already best-effort, there
> >> > are a number of conditions that can prevent it, whereas cfs runtime
> >> > bandwidth is expected to be enforced.
> >> >
> >> > Make the scheduler favor bandwidth over stopping the tick by setting
> >> > TICK_DEP_BIT_SCHED when the only running task is a cfs task with
> >> > runtime limit enabled.
> >> >
> >> > Add sched_feat HZ_BW (off by default) to control this behavior.
> >> >
> >> > Signed-off-by: Phil Auld <[email protected]>
> >> > Cc: Ingo Molnar <[email protected]>
> >> > Cc: Peter Zijlstra <[email protected]>
> >> > Cc: Vincent Guittot <[email protected]>
> >> > Cc: Juri Lelli <[email protected]>
> >> > Cc: Dietmar Eggemann <[email protected]>
> >> > Cc: Valentin Schneider <[email protected]>
> >> > Cc: Ben Segall <[email protected]>
> >> > ---
> >> >
> >> > v2: Ben pointed out that the bit could get cleared in the dequeue path
> >> > if we migrate a newly enqueued task without preempting curr. Added a
> >> > check for that edge case to sched_can_stop_tick. Removed the call to
> >> > sched_can_stop_tick from sched_fair_update_stop_tick since it was
> >> > redundant.
> >> >
> >> > kernel/sched/core.c | 12 +++++++++++
> >> > kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++
> >> > kernel/sched/features.h | 2 ++
> >> > 3 files changed, 59 insertions(+)
> >> >
> >> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> > index a68d1276bab0..646f60bfc7e7 100644
> >> > --- a/kernel/sched/core.c
> >> > +++ b/kernel/sched/core.c
> >> > @@ -1194,6 +1194,8 @@ static void nohz_csd_func(void *info)
> >> > #endif /* CONFIG_NO_HZ_COMMON */
> >> >
> >> > #ifdef CONFIG_NO_HZ_FULL
> >> > +extern bool sched_cfs_bandwidth_active(struct cfs_rq *cfs_rq);
> >> > +
> >> > bool sched_can_stop_tick(struct rq *rq)
> >> > {
> >> > int fifo_nr_running;
> >> > @@ -1229,6 +1231,16 @@ bool sched_can_stop_tick(struct rq *rq)
> >> > if (rq->nr_running > 1)
> >> > return false;
> >> >
> >> > + /*
> >> > + * If there is one task and it has CFS runtime bandwidth constraints
> >> > + * and it's on the cpu now we don't want to stop the tick.
> >> > + */
> >> > + if (sched_feat(HZ_BW) && rq->nr_running == 1 && rq->curr
> >> > + && rq->curr->sched_class == &fair_sched_class && task_on_rq_queued(rq->curr)) {
> >> > + if (sched_cfs_bandwidth_active(task_cfs_rq(rq->curr)))
> >>
> >> Actually, something I should have noticed earlier is that this should
> >> probably be hierarchical, right? You need to check every ancestor
> >> cfs_rq, not just the immediate parent. And at that point it probably
> >> makes sense to have sched_cfs_bandwidth_active take a task_struct.
> >>
> >
> > Are you saying a child cfs_rq with a parent that has runtime_enabled could
> > itself not have runtime_enabled? I may be missing something but I don't
> > see how that works.
>
> Correct.
>

Go figure. I'd have thought that was inherited downwards.

> >
> > account_cfs_rq_runtime() for example just looks at the immediate cfs_rq of
> > curr and bails if it does not have runtime_enabled. How could that task get
> > throttled if it exceeds some parent's limit?
>
> account_cfs_rq_runtime() is called (primarily) from update_curr(), which
> is called by enqueue_entity/dequeue_entity/entity_tick/etc, which are
> called at each level of the hierarchy.
>

Yeah, I'm seeing that now, thanks!

> The worse cache behavior of doing a separate walk in sched_can_stop_tick
> aka add/sub_nr_running could I guess be avoided by having some
> runtime_enabled flag on the task struct or rq that is up to date for
> rq->curr only. That would only be a little annoying to keep accurate,
> and there's the dual arguments of "task_struct/rq is already too
> cluttered"/"well they're already so cluttered a little more won't hurt".
>

I think since this is under a scheduler feat atm it will be okay to
just do the loops in line and not add the machinery to track it. That's
what it does every tick etc anyway. I'll try that and see what it looks
like. I guess it needs this in the check from PNT as well...

Cheers,
Phil

--