Architectures that implement arch_asym_cpu_priority() may need to know the
idle state of the SMT siblings of a CPU. The scheduler has this information
and functionality. Expose it.
Move the existing functionality outside of the NUMA code.
Cc: Ben Segall <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Tim C. Chen <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v1:
* Introduced this patch.
---
include/linux/sched.h | 2 ++
kernel/sched/fair.c | 39 ++++++++++++++++++++++-----------------
2 files changed, 24 insertions(+), 17 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd13..0d01c64ac737 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2426,4 +2426,6 @@ static inline void sched_core_fork(struct task_struct *p) { }
extern void sched_set_stop_task(int cpu, struct task_struct *stop);
+extern bool sched_smt_siblings_idle(int cpu);
+
#endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0e4251f83807..9517c48df50e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
* Scheduling class queueing methods:
*/
+static inline bool is_core_idle(int cpu)
+{
+#ifdef CONFIG_SCHED_SMT
+ int sibling;
+
+ for_each_cpu(sibling, cpu_smt_mask(cpu)) {
+ if (cpu == sibling)
+ continue;
+
+ if (!idle_cpu(sibling))
+ return false;
+ }
+#endif
+
+ return true;
+}
+
+bool sched_smt_siblings_idle(int cpu)
+{
+ return is_core_idle(cpu);
+}
+
#ifdef CONFIG_NUMA
#define NUMA_IMBALANCE_MIN 2
@@ -1691,23 +1713,6 @@ struct numa_stats {
int idle_cpu;
};
-static inline bool is_core_idle(int cpu)
-{
-#ifdef CONFIG_SCHED_SMT
- int sibling;
-
- for_each_cpu(sibling, cpu_smt_mask(cpu)) {
- if (cpu == sibling)
- continue;
-
- if (!idle_cpu(sibling))
- return false;
- }
-#endif
-
- return true;
-}
-
struct task_numa_env {
struct task_struct *p;
--
2.25.1
On 22/11/2022 21:35, Ricardo Neri wrote:
> Architectures that implement arch_asym_cpu_priority() may need to know the
> idle state of the SMT siblings of a CPU. The scheduler has this information
> and functionality. Expose it.
>
> Move the existing functionality outside of the NUMA code.
[...]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0e4251f83807..9517c48df50e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> * Scheduling class queueing methods:
> */
>
> +static inline bool is_core_idle(int cpu)
> +{
> +#ifdef CONFIG_SCHED_SMT
> + int sibling;
> +
> + for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> + if (cpu == sibling)
> + continue;
> +
> + if (!idle_cpu(sibling))
> + return false;
> + }
> +#endif
> +
> + return true;
> +}
> +
> +bool sched_smt_siblings_idle(int cpu)
> +{
> + return is_core_idle(cpu);
> +}
Nitpick: Can we not just have one exported function for both use-cases:
NUMA and x86 ITMT?
[...]
On Tue, Dec 06, 2022 at 07:03:37PM +0100, Dietmar Eggemann wrote:
> On 22/11/2022 21:35, Ricardo Neri wrote:
> > Architectures that implement arch_asym_cpu_priority() may need to know the
> > idle state of the SMT siblings of a CPU. The scheduler has this information
> > and functionality. Expose it.
> >
> > Move the existing functionality outside of the NUMA code.
>
> [...]
>
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0e4251f83807..9517c48df50e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > * Scheduling class queueing methods:
> > */
> >
> > +static inline bool is_core_idle(int cpu)
> > +{
> > +#ifdef CONFIG_SCHED_SMT
> > + int sibling;
> > +
> > + for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> > + if (cpu == sibling)
> > + continue;
> > +
> > + if (!idle_cpu(sibling))
> > + return false;
> > + }
> > +#endif
> > +
> > + return true;
> > +}
> > +
> > +bool sched_smt_siblings_idle(int cpu)
> > +{
> > + return is_core_idle(cpu);
> > +}
>
> Nitpick: Can we not just have one exported function for both use-cases:
> NUMA and x86 ITMT?
By adding a new function I intend to preserve the inlinig of is_core_idle()
in update_numa_stats() (via numa_idle_core(), which is also inline). Do you
think there is no value?
A downside of having the new function is that now the code is duplicated
in update_numa_stats() and sched_smt_siblings_idle().
I can take your suggestion if losing the inline is OK.
Thanks and BR,
Ricardo
On 12/12/2022 18:54, Ricardo Neri wrote:
> On Tue, Dec 06, 2022 at 07:03:37PM +0100, Dietmar Eggemann wrote:
>> On 22/11/2022 21:35, Ricardo Neri wrote:
[...]
>>> +bool sched_smt_siblings_idle(int cpu)
>>> +{
>>> + return is_core_idle(cpu);
>>> +}
>>
>> Nitpick: Can we not just have one exported function for both use-cases:
>> NUMA and x86 ITMT?
>
> By adding a new function I intend to preserve the inlinig of is_core_idle()
> in update_numa_stats() (via numa_idle_core(), which is also inline). Do you
> think there is no value?
OK. It's only used in NUMA balancing (task_numa_fault() -> ... ->
update_numa_stats()). I can't see that this will have a noticeable perf
impact but only benchmark can really tell.
A `static inline bool sched_is_core_idle(int cpu)` via
include/linux/sched/topology.h might work? We have similar functions
(like sched_core_cookie_match()` but in the private scheduler header
file kernel/sched/sched.h though.
> A downside of having the new function is that now the code is duplicated
> in update_numa_stats() and sched_smt_siblings_idle().
>
> I can take your suggestion if losing the inline is OK.
I doubt that it will have an impact but can't be sure.
[...]
On 22/11/22 12:35, Ricardo Neri wrote:
> Architectures that implement arch_asym_cpu_priority() may need to know the
> idle state of the SMT siblings of a CPU. The scheduler has this information
> and functionality. Expose it.
>
> Move the existing functionality outside of the NUMA code.
>
test_idle_cores() does something similar without an iteration, did you
consider using that instead?
> Cc: Ben Segall <[email protected]>
> Cc: Daniel Bristot de Oliveira <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Tim C. Chen <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Ricardo Neri <[email protected]>
> ---
> Changes since v1:
> * Introduced this patch.
> ---
> include/linux/sched.h | 2 ++
> kernel/sched/fair.c | 39 ++++++++++++++++++++++-----------------
> 2 files changed, 24 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ffb6eb55cd13..0d01c64ac737 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2426,4 +2426,6 @@ static inline void sched_core_fork(struct task_struct *p) { }
>
> extern void sched_set_stop_task(int cpu, struct task_struct *stop);
>
> +extern bool sched_smt_siblings_idle(int cpu);
> +
> #endif
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0e4251f83807..9517c48df50e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1052,6 +1052,28 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
> * Scheduling class queueing methods:
> */
>
> +static inline bool is_core_idle(int cpu)
> +{
> +#ifdef CONFIG_SCHED_SMT
> + int sibling;
> +
> + for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> + if (cpu == sibling)
> + continue;
> +
> + if (!idle_cpu(sibling))
> + return false;
> + }
> +#endif
> +
> + return true;
> +}
> +
> +bool sched_smt_siblings_idle(int cpu)
> +{
> + return is_core_idle(cpu);
> +}
> +
> #ifdef CONFIG_NUMA
> #define NUMA_IMBALANCE_MIN 2
>
> @@ -1691,23 +1713,6 @@ struct numa_stats {
> int idle_cpu;
> };
>
> -static inline bool is_core_idle(int cpu)
> -{
> -#ifdef CONFIG_SCHED_SMT
> - int sibling;
> -
> - for_each_cpu(sibling, cpu_smt_mask(cpu)) {
> - if (cpu == sibling)
> - continue;
> -
> - if (!idle_cpu(sibling))
> - return false;
> - }
> -#endif
> -
> - return true;
> -}
> -
> struct task_numa_env {
> struct task_struct *p;
>
> --
> 2.25.1
On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> On 22/11/22 12:35, Ricardo Neri wrote:
> > Architectures that implement arch_asym_cpu_priority() may need to know the
> > idle state of the SMT siblings of a CPU. The scheduler has this information
> > and functionality. Expose it.
> >
> > Move the existing functionality outside of the NUMA code.
> >
>
> test_idle_cores() does something similar without an iteration, did you
> consider using that instead?
IIUC, test_idle_cores() returns true if there is at least one idle core in
the package. In my case, I need to know the idle state of only the SMT
siblings of a specific CPU. Am I missing something?
Thanks and BR,
Ricardo
On 2022-12-23 at 21:28:50 -0800, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> > On 22/11/22 12:35, Ricardo Neri wrote:
> > > Architectures that implement arch_asym_cpu_priority() may need to know the
> > > idle state of the SMT siblings of a CPU. The scheduler has this information
> > > and functionality. Expose it.
> > >
> > > Move the existing functionality outside of the NUMA code.
> > >
> >
> > test_idle_cores() does something similar without an iteration, did you
> > consider using that instead?
>
> IIUC, test_idle_cores() returns true if there is at least one idle core in
> the package. In my case, I need to know the idle state of only the SMT
> siblings of a specific CPU. Am I missing something?
>
I guess a similar one is select_idle_core(), but it also consider the CPU with
SCHED_IDLE task as idle. Is CPU with SCHED_IDLE task a candidate in your
scenario?
thanks,
Chenyu
> Thanks and BR,
> Ricardo
On Wed, Dec 28, 2022 at 11:29:52PM +0800, Chen Yu wrote:
> On 2022-12-23 at 21:28:50 -0800, Ricardo Neri wrote:
> > On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
> > > On 22/11/22 12:35, Ricardo Neri wrote:
> > > > Architectures that implement arch_asym_cpu_priority() may need to know the
> > > > idle state of the SMT siblings of a CPU. The scheduler has this information
> > > > and functionality. Expose it.
> > > >
> > > > Move the existing functionality outside of the NUMA code.
> > > >
> > >
> > > test_idle_cores() does something similar without an iteration, did you
> > > consider using that instead?
> >
> > IIUC, test_idle_cores() returns true if there is at least one idle core in
> > the package. In my case, I need to know the idle state of only the SMT
> > siblings of a specific CPU. Am I missing something?
> >
> I guess a similar one is select_idle_core(), but it also consider the CPU with
> SCHED_IDLE task as idle. Is CPU with SCHED_IDLE task a candidate in your
> scenario?
However, we are not looking for an idle CPU. We want to know the idle state of
the siblings of a CPU. I see that select_idle_core() uses available_idle_cpu(),
which in turn uses idle_cpu(). is_core_idle() also uses it.
As per 943d355d7fee ("sched/core: Distinguish between idle_cpu() calls based on
desired effect, introduce available_idle_cpu()") the load balancer can just
call idle_cpu().
Thanks and BR,
Ricardo
On 23/12/22 21:28, Ricardo Neri wrote:
> On Thu, Dec 22, 2022 at 04:56:22PM +0000, Valentin Schneider wrote:
>> On 22/11/22 12:35, Ricardo Neri wrote:
>> > Architectures that implement arch_asym_cpu_priority() may need to know the
>> > idle state of the SMT siblings of a CPU. The scheduler has this information
>> > and functionality. Expose it.
>> >
>> > Move the existing functionality outside of the NUMA code.
>> >
>>
>> test_idle_cores() does something similar without an iteration, did you
>> consider using that instead?
>
> IIUC, test_idle_cores() returns true if there is at least one idle core in
> the package. In my case, I need to know the idle state of only the SMT
> siblings of a specific CPU. Am I missing something?
>
No, you're right, clearly I needed that end of the year break - sorry for
the noise.
> Thanks and BR,
> Ricardo