Nothing protects the access to the per_cpu variable sd_llc_id. When testing
the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
update_top_cache_domain(). One scenario being:
CPU1 CPU2
==================================================================
per_cpu(sd_llc_id, CPUX) => 0
partition_sched_domains_locked()
detach_destroy_domains()
cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
per_cpu(sd_llc_id, CPUX) => 0
per_cpu(sd_llc_id, CPUX) = CPUX
per_cpu(sd_llc_id, CPUX) => CPUX
return false
ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
is a warning triggered from ttwu_queue_wakelist().
Avoid a such race in cpus_share_cache() by always returning true when
this_cpu == that_cpu.
Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
Reported-by: Jing-Ting Wu <[email protected]>
Signed-off-by: Vincent Donnefort <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f2611b9cf503..f5ca15cdcff4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
bool cpus_share_cache(int this_cpu, int that_cpu)
{
+ if (this_cpu == that_cpu)
+ return true;
+
return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
}
--
2.25.1
On Thu, 4 Nov 2021 at 18:52, Vincent Donnefort
<[email protected]> wrote:
>
> Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> update_top_cache_domain(). One scenario being:
>
> CPU1 CPU2
> ==================================================================
>
> per_cpu(sd_llc_id, CPUX) => 0
> partition_sched_domains_locked()
> detach_destroy_domains()
> cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
> per_cpu(sd_llc_id, CPUX) => 0
> per_cpu(sd_llc_id, CPUX) = CPUX
> per_cpu(sd_llc_id, CPUX) => CPUX
> return false
>
> ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> is a warning triggered from ttwu_queue_wakelist().
>
> Avoid a such race in cpus_share_cache() by always returning true when
> this_cpu == that_cpu.
>
> Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> Reported-by: Jing-Ting Wu <[email protected]>
> Signed-off-by: Vincent Donnefort <[email protected]>
> Reviewed-by: Valentin Schneider <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2611b9cf503..f5ca15cdcff4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
>
> bool cpus_share_cache(int this_cpu, int that_cpu)
> {
> + if (this_cpu == that_cpu)
> + return true;
> +
> return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
> }
>
> --
> 2.25.1
>
On Fri, Nov 05, 2021 at 04:08:33PM +0100, Vincent Guittot wrote:
> On Thu, 4 Nov 2021 at 18:52, Vincent Donnefort
> <[email protected]> wrote:
> >
> > Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> > the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> > update_top_cache_domain(). One scenario being:
> >
> > CPU1 CPU2
> > ==================================================================
> >
> > per_cpu(sd_llc_id, CPUX) => 0
> > partition_sched_domains_locked()
> > detach_destroy_domains()
> > cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
> > per_cpu(sd_llc_id, CPUX) => 0
> > per_cpu(sd_llc_id, CPUX) = CPUX
> > per_cpu(sd_llc_id, CPUX) => CPUX
> > return false
> >
> > ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> > is a warning triggered from ttwu_queue_wakelist().
> >
> > Avoid a such race in cpus_share_cache() by always returning true when
> > this_cpu == that_cpu.
> >
> > Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> > Reported-by: Jing-Ting Wu <[email protected]>
> > Signed-off-by: Vincent Donnefort <[email protected]>
> > Reviewed-by: Valentin Schneider <[email protected]>
>
> Reviewed-by: Vincent Guittot <[email protected]>
>
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f2611b9cf503..f5ca15cdcff4 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
> >
> > bool cpus_share_cache(int this_cpu, int that_cpu)
> > {
> > + if (this_cpu == that_cpu)
> > + return true;
> > +
> > return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
> > }
Blergh, that's annoying. Thanks guys, picked it up for /urgent
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 42dc938a590c96eeb429e1830123fef2366d9c80
Gitweb: https://git.kernel.org/tip/42dc938a590c96eeb429e1830123fef2366d9c80
Author: Vincent Donnefort <[email protected]>
AuthorDate: Thu, 04 Nov 2021 17:51:20
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 11 Nov 2021 13:09:32 +01:00
sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()
Nothing protects the access to the per_cpu variable sd_llc_id. When testing
the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
update_top_cache_domain(). One scenario being:
CPU1 CPU2
==================================================================
per_cpu(sd_llc_id, CPUX) => 0
partition_sched_domains_locked()
detach_destroy_domains()
cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
per_cpu(sd_llc_id, CPUX) => 0
per_cpu(sd_llc_id, CPUX) = CPUX
per_cpu(sd_llc_id, CPUX) => CPUX
return false
ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
is a warning triggered from ttwu_queue_wakelist().
Avoid a such race in cpus_share_cache() by always returning true when
this_cpu == that_cpu.
Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
Reported-by: Jing-Ting Wu <[email protected]>
Signed-off-by: Vincent Donnefort <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 523fd60..cec173a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3726,6 +3726,9 @@ out:
bool cpus_share_cache(int this_cpu, int that_cpu)
{
+ if (this_cpu == that_cpu)
+ return true;
+
return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
}
On Nov 04, 2021 / 17:51, Vincent Donnefort wrote:
> Nothing protects the access to the per_cpu variable sd_llc_id. When testing
> the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
> update_top_cache_domain(). One scenario being:
>
> CPU1 CPU2
> ==================================================================
>
> per_cpu(sd_llc_id, CPUX) => 0
> partition_sched_domains_locked()
> detach_destroy_domains()
> cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX)
> per_cpu(sd_llc_id, CPUX) => 0
> per_cpu(sd_llc_id, CPUX) = CPUX
> per_cpu(sd_llc_id, CPUX) => CPUX
> return false
>
> ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
> is a warning triggered from ttwu_queue_wakelist().
>
> Avoid a such race in cpus_share_cache() by always returning true when
> this_cpu == that_cpu.
>
> Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
> Reported-by: Jing-Ting Wu <[email protected]>
> Signed-off-by: Vincent Donnefort <[email protected]>
> Reviewed-by: Valentin Schneider <[email protected]>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2611b9cf503..f5ca15cdcff4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3726,6 +3726,9 @@ void wake_up_if_idle(int cpu)
>
> bool cpus_share_cache(int this_cpu, int that_cpu)
> {
> + if (this_cpu == that_cpu)
> + return true;
> +
> return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
> }
>
> --
> 2.25.1
>
Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
[1] https://lore.kernel.org/all/[email protected]/
--
Best Regards,
Shin'ichiro Kawasaki
On 15/11/21 10:46, Shinichiro Kawasaki wrote:
> Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
> that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
Oh, that sucks, sorry about that. I do try to keep an eye out for sched
stuff sent to LKML, but clearly there are some that fall through :(
> --
> Best Regards,
> Shin'ichiro Kawasaki
Hi Shinichiro
On Mon, 15 Nov 2021 at 19:34, Valentin Schneider
<[email protected]> wrote:
>
> On 15/11/21 10:46, Shinichiro Kawasaki wrote:
> > Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
> > that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
> >
> > [1] https://lore.kernel.org/all/[email protected]/
> >
>
> Oh, that sucks, sorry about that. I do try to keep an eye out for sched
> stuff sent to LKML, but clearly there are some that fall through :(
I would advise you to use get_maintainer.pl to make sure to not miss
to cc someone.
Like valentin, this patch has been lost in the lkml flow and i haven't
noticed it
Regards,
Vincent
>
> > --
> > Best Regards,
> > Shin'ichiro Kawasaki
On Nov 16, 2021 / 09:02, Vincent Guittot wrote:
> Hi Shinichiro
>
> On Mon, 15 Nov 2021 at 19:34, Valentin Schneider
> <[email protected]> wrote:
> >
> > On 15/11/21 10:46, Shinichiro Kawasaki wrote:
> > > Oh, this is the exactly same fix as I posted before [1]. It is a little bit sad
> > > that my post did not get reviewed. Anyway, good to see the issue fixed. Thanks.
> > >
> > > [1] https://lore.kernel.org/all/[email protected]/
> > >
> >
> > Oh, that sucks, sorry about that. I do try to keep an eye out for sched
> > stuff sent to LKML, but clearly there are some that fall through :(
>
> I would advise you to use get_maintainer.pl to make sure to not miss
> to cc someone.
> Like valentin, this patch has been lost in the lkml flow and i haven't
> noticed it
Hi Velentin, Vincent, thank you for the comments.
I have run get_maintainer.pl on may patch, and it listed 9 maintainers and
reviewers. But my patch had only 2 out of the 9 in its To/Cc list, which was
too small. For next time, I will use get_maintainer.pl and post with full
To/Cc list. Thanks!
--
Best Regards,
Shin'ichiro Kawasaki