2024-04-09 08:38:57

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 0/2] sched/topology: Optimize topology_span_sane()

On 19/03/24 13:51, Kyle Meyer wrote:
> A soft lockup is being detected in build_sched_domains() on 32 socket
> Sapphire Rapids systems with 3840 processors.
>
> topology_span_sane(), called by build_sched_domains(), checks that each
> processor's non-NUMA scheduling domains are completely equal or
> completely disjoint. If a non-NUMA scheduling domain partially overlaps
> another, scheduling groups can break.
>
> This series adds for_each_cpu_from() as a generic cpumask macro to
> optimize topology_span_sane() by removing duplicate comparisons. The
> total number of comparisons is reduced from N * (N - 1) to
> N * (N - 1) / 2 (per non-NUMA scheduling domain level), decreasing the
> boot time by approximately 20 seconds and preventing the soft lockup on
> the mentioned systems.
>
> Kyle Meyer (2):
> cpumask: Add for_each_cpu_from()
> sched/topology: Optimize topology_span_sane()

I somehow never got 2/2, and it doesn't show up on lore.kernel.org
either. I can see it from Yury's reply and it looks OK to me, but you'll
have to resend it for maintainers to be able to pick it up.

>
> include/linux/cpumask.h | 10 ++++++++++
> kernel/sched/topology.c | 6 ++----
> 2 files changed, 12 insertions(+), 4 deletions(-)
>
> --
> 2.44.0