2022-09-08 08:11:31

by Abel Wu

[permalink] [raw]
Subject: [PATCH] sched/fair: Fix misuse of available_idle_cpu()

The function available_idle_cpu() was introduced to distinguish
between the code paths that cares if the vCPU is preempted and
the ones don't care. In general, available_idle_cpu() is used in
selecting cpus for immediate use, e.g. ttwu. While idle_cpu() is
used in the paths that only cares about the cpu is idle or not,
and __update_idle_core() is one of them.

Use idle_cpu() instead in the idle path to make has_idle_core
a better hint.

Fixes: 943d355d7fee (sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu())
Signed-off-by: Abel Wu <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index efceb670e755..5a76d814f8bc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6302,7 +6302,7 @@ void __update_idle_core(struct rq *rq)
if (cpu == core)
continue;

- if (!available_idle_cpu(cpu))
+ if (!idle_cpu(cpu))
goto unlock;
}

--
2.37.3


2022-09-08 10:53:59

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix misuse of available_idle_cpu()

On Thu, Sep 08, 2022 at 04:07:02PM +0800, Abel Wu wrote:
> The function available_idle_cpu() was introduced to distinguish
> between the code paths that cares if the vCPU is preempted and
> the ones don't care. In general, available_idle_cpu() is used in
> selecting cpus for immediate use, e.g. ttwu. While idle_cpu() is
> used in the paths that only cares about the cpu is idle or not,
> and __update_idle_core() is one of them.
>
> Use idle_cpu() instead in the idle path to make has_idle_core
> a better hint.
>
> Fixes: 943d355d7fee (sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu())
> Signed-off-by: Abel Wu <[email protected]>

Seems fair. As vCPU preemption is specific to virtualisation, it is very
unlikely that SMT is exposed to the guest so the impact of the patch is
minimal but I still think it's right so;

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2022-09-08 13:48:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix misuse of available_idle_cpu()

On Thu, Sep 08, 2022 at 11:36:32AM +0100, Mel Gorman wrote:
> On Thu, Sep 08, 2022 at 04:07:02PM +0800, Abel Wu wrote:
> > The function available_idle_cpu() was introduced to distinguish
> > between the code paths that cares if the vCPU is preempted and
> > the ones don't care. In general, available_idle_cpu() is used in
> > selecting cpus for immediate use, e.g. ttwu. While idle_cpu() is
> > used in the paths that only cares about the cpu is idle or not,
> > and __update_idle_core() is one of them.
> >
> > Use idle_cpu() instead in the idle path to make has_idle_core
> > a better hint.
> >
> > Fixes: 943d355d7fee (sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu())
> > Signed-off-by: Abel Wu <[email protected]>
>
> Seems fair. As vCPU preemption is specific to virtualisation, it is very
> unlikely that SMT is exposed to the guest so the impact of the patch is

Right; only pinned guests typically expose such topology information
(anything else would be quite broken).

> minimal but I still think it's right so;

I'm not convinced; all of select_idle_sibling() seems to use
available_idle_cpu(), and that's the only consumer of
__update_idle_core(), so in that respect the current state makes sense.

2022-09-08 14:15:40

by Abel Wu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix misuse of available_idle_cpu()

On 9/8/22 9:17 PM, Peter Zijlstra wrote:
> On Thu, Sep 08, 2022 at 11:36:32AM +0100, Mel Gorman wrote:
>> On Thu, Sep 08, 2022 at 04:07:02PM +0800, Abel Wu wrote:
>>> The function available_idle_cpu() was introduced to distinguish
>>> between the code paths that cares if the vCPU is preempted and
>>> the ones don't care. In general, available_idle_cpu() is used in
>>> selecting cpus for immediate use, e.g. ttwu. While idle_cpu() is
>>> used in the paths that only cares about the cpu is idle or not,
>>> and __update_idle_core() is one of them.
>>>
>>> Use idle_cpu() instead in the idle path to make has_idle_core
>>> a better hint.
>>>
>>> Fixes: 943d355d7fee (sched/core: Distinguish between idle_cpu() calls based on desired effect, introduce available_idle_cpu())
>>> Signed-off-by: Abel Wu <[email protected]>
>>
>> Seems fair. As vCPU preemption is specific to virtualisation, it is very
>> unlikely that SMT is exposed to the guest so the impact of the patch is
>
> Right; only pinned guests typically expose such topology information
> (anything else would be quite broken).
>
>> minimal but I still think it's right so;
>
> I'm not convinced; all of select_idle_sibling() seems to use
> available_idle_cpu(), and that's the only consumer of
> __update_idle_core(), so in that respect the current state makes sense.

Hi Peter, Mel, thanks for your reviewing!

My thought was that the preempted core can become active again before
select_idle_sibling() is called, so using available_idle_cpu() in
__update_idle_core() can potentially lose the opportunity to kick an
idle core running. While the downside of using idle_cpu() is that a
full scan can be triggered irrespective of non-preempted cores exist,
but even available_idle_cpu() can not make sure of that either.

BTW, I am also confused with select_idle_core() in which all the cpus
of a core need to be non-preempted before the core can be taken as an
idle core. IMHO, it might be enough that at least one cpu of an idle
core is non-preempted and allowed by task's taskset.

Thanks & BR,
Abel

2022-09-20 07:59:07

by Abel Wu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix misuse of available_idle_cpu()

Ping :)

On 9/8/22 9:49 PM, Abel Wu wrote:
> On 9/8/22 9:17 PM, Peter Zijlstra wrote:
>> On Thu, Sep 08, 2022 at 11:36:32AM +0100, Mel Gorman wrote:
>>> On Thu, Sep 08, 2022 at 04:07:02PM +0800, Abel Wu wrote:
>>>> The function available_idle_cpu() was introduced to distinguish
>>>> between the code paths that cares if the vCPU is preempted and
>>>> the ones don't care. In general, available_idle_cpu() is used in
>>>> selecting cpus for immediate use, e.g. ttwu. While idle_cpu() is
>>>> used in the paths that only cares about the cpu is idle or not,
>>>> and __update_idle_core() is one of them.
>>>>
>>>> Use idle_cpu() instead in the idle path to make has_idle_core
>>>> a better hint.
>>>>
>>>> Fixes: 943d355d7fee (sched/core: Distinguish between idle_cpu()
>>>> calls based on desired effect, introduce available_idle_cpu())
>>>> Signed-off-by: Abel Wu <[email protected]>
>>>
>>> Seems fair. As vCPU preemption is specific to virtualisation, it is very
>>> unlikely that SMT is exposed to the guest so the impact of the patch is
>>
>> Right; only pinned guests typically expose such topology information
>> (anything else would be quite broken).

Yes, and it is common in our ECS servers to use pinned guests.

>>
>>> minimal but I still think it's right so;
>>
>> I'm not convinced; all of select_idle_sibling() seems to use
>> available_idle_cpu(), and that's the only consumer of
>> __update_idle_core(), so in that respect the current state makes sense.
>
> Hi Peter, Mel, thanks for your reviewing!
>
> My thought was that the preempted core can become active again before
> select_idle_sibling() is called, so using available_idle_cpu() in
> __update_idle_core() can potentially lose the opportunity to kick an
> idle core running. While the downside of using idle_cpu() is that a
> full scan can be triggered irrespective of non-preempted cores exist,
> but even available_idle_cpu() can not make sure of that either.
>
> BTW, I am also confused with select_idle_core() in which all the cpus
> of a core need to be non-preempted before the core can be taken as an
> idle core. IMHO, it might be enough that at least one cpu of an idle
> core is non-preempted and allowed by task's taskset.