2013-08-19 16:14:15

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 09/10] sched, fair: Fix the sd_parent_degenerate() code

I found that on my wsm box I had a redundant domain:

[ 0.949769] CPU0 attaching sched-domain:
[ 0.953765] domain 0: span 0,12 level SIBLING
[ 0.958335] groups: 0 (cpu_power = 587) 12 (cpu_power = 588)
[ 0.964548] domain 1: span 0-5,12-17 level MC
[ 0.969206] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176)
[ 0.984993] domain 2: span 0-5,12-17 level CPU
[ 0.989822] groups: 0-5,12-17 (cpu_power = 7055)
[ 0.995049] domain 3: span 0-23 level NUMA
[ 0.999620] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056)

Note how domain 2 has only a single group and spans the same CPUs as
domain 1. We should not keep such domains and do in fact have code to
prune these.

It turns out that the 'new' SD_PREFER_SIBLING flag causes this, it
makes sd_parent_degenerate() fail on the CPU domain. We can easily fix
this by 'ignoring' the SD_PREFER_SIBLING bit and transfering it to
whatever domain ends up covering the span.

With this patch the domains now look like this:

[ 0.950419] CPU0 attaching sched-domain:
[ 0.954454] domain 0: span 0,12 level SIBLING
[ 0.959039] groups: 0 (cpu_power = 587) 12 (cpu_power = 588)
[ 0.965271] domain 1: span 0-5,12-17 level MC
[ 0.969936] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176)
[ 0.985737] domain 2: span 0-23 level NUMA
[ 0.990231] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056)

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4948,7 +4948,8 @@ sd_parent_degenerate(struct sched_domain
SD_BALANCE_FORK |
SD_BALANCE_EXEC |
SD_SHARE_CPUPOWER |
- SD_SHARE_PKG_RESOURCES);
+ SD_SHARE_PKG_RESOURCES |
+ SD_PREFER_SIBLING);
if (nr_node_ids == 1)
pflags &= ~SD_SERIALIZE;
}
@@ -5157,6 +5158,13 @@ cpu_attach_domain(struct sched_domain *s
tmp->parent = parent->parent;
if (parent->parent)
parent->parent->child = tmp;
+ /*
+ * Transfer SD_PREFER_SIBLING down in case of a
+ * degenerate parent; the spans match for this
+ * so the property transfers.
+ */
+ if (parent->flags & SD_PREFER_SIBLING)
+ tmp->flags |= SD_PREFER_SIBLING;
destroy_sched_domain(parent, cpu);
} else
tmp = tmp->parent;


2013-08-24 10:46:29

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH 09/10] sched, fair: Fix the sd_parent_degenerate() code

On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra <[email protected]> wrote:
> I found that on my wsm box I had a redundant domain:
>
> [ 0.949769] CPU0 attaching sched-domain:
> [ 0.953765] domain 0: span 0,12 level SIBLING
> [ 0.958335] groups: 0 (cpu_power = 587) 12 (cpu_power = 588)
> [ 0.964548] domain 1: span 0-5,12-17 level MC
> [ 0.969206] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176)
> [ 0.984993] domain 2: span 0-5,12-17 level CPU
> [ 0.989822] groups: 0-5,12-17 (cpu_power = 7055)
> [ 0.995049] domain 3: span 0-23 level NUMA
> [ 0.999620] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056)
>
> Note how domain 2 has only a single group and spans the same CPUs as
> domain 1. We should not keep such domains and do in fact have code to
> prune these.
>
> It turns out that the 'new' SD_PREFER_SIBLING flag causes this, it
> makes sd_parent_degenerate() fail on the CPU domain. We can easily fix
> this by 'ignoring' the SD_PREFER_SIBLING bit and transfering it to
> whatever domain ends up covering the span.
>
> With this patch the domains now look like this:
>
> [ 0.950419] CPU0 attaching sched-domain:
> [ 0.954454] domain 0: span 0,12 level SIBLING
> [ 0.959039] groups: 0 (cpu_power = 587) 12 (cpu_power = 588)
> [ 0.965271] domain 1: span 0-5,12-17 level MC
> [ 0.969936] groups: 0,12 (cpu_power = 1175) 1,13 (cpu_power = 1176) 2,14 (cpu_power = 1176) 3,15 (cpu_power = 1176) 4,16 (cpu_power = 1176) 5,17 (cpu_power = 1176)
> [ 0.985737] domain 2: span 0-23 level NUMA
> [ 0.990231] groups: 0-5,12-17 (cpu_power = 7055) 6-11,18-23 (cpu_power = 7056)
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched/core.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4948,7 +4948,8 @@ sd_parent_degenerate(struct sched_domain
> SD_BALANCE_FORK |
> SD_BALANCE_EXEC |
> SD_SHARE_CPUPOWER |
> - SD_SHARE_PKG_RESOURCES);
> + SD_SHARE_PKG_RESOURCES |
> + SD_PREFER_SIBLING);
> if (nr_node_ids == 1)
> pflags &= ~SD_SERIALIZE;
> }
> @@ -5157,6 +5158,13 @@ cpu_attach_domain(struct sched_domain *s
> tmp->parent = parent->parent;
> if (parent->parent)
> parent->parent->child = tmp;
> + /*
> + * Transfer SD_PREFER_SIBLING down in case of a
> + * degenerate parent; the spans match for this
> + * so the property transfers.
> + */
> + if (parent->flags & SD_PREFER_SIBLING)
> + tmp->flags |= SD_PREFER_SIBLING;
> destroy_sched_domain(parent, cpu);
> } else
> tmp = tmp->parent;
>

Reviewed-by: Paul Turner <[email protected]>

2013-08-26 12:09:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 09/10] sched, fair: Fix the sd_parent_degenerate() code

On Sat, Aug 24, 2013 at 03:45:57AM -0700, Paul Turner wrote:
> > @@ -5157,6 +5158,13 @@ cpu_attach_domain(struct sched_domain *s
> > tmp->parent = parent->parent;
> > if (parent->parent)
> > parent->parent->child = tmp;
> > + /*
> > + * Transfer SD_PREFER_SIBLING down in case of a
> > + * degenerate parent; the spans match for this
> > + * so the property transfers.
> > + */
> > + if (parent->flags & SD_PREFER_SIBLING)
> > + tmp->flags |= SD_PREFER_SIBLING;
> > destroy_sched_domain(parent, cpu);
> > } else
> > tmp = tmp->parent;
> >
>
> Reviewed-by: Paul Turner <[email protected]>

BTW, did that comment make sense to you or would you suggest something
different? I had/am having a hard time with that comment. Somehow it
leaves me wanting. I know I understand the issue now, but I'll doubt the
comment will suffice in a years time :/

2013-08-26 21:50:11

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 09/10] sched, fair: Fix the sd_parent_degenerate() code

On 08/26/2013 08:09 AM, Peter Zijlstra wrote:
> On Sat, Aug 24, 2013 at 03:45:57AM -0700, Paul Turner wrote:
>>> @@ -5157,6 +5158,13 @@ cpu_attach_domain(struct sched_domain *s
>>> tmp->parent = parent->parent;
>>> if (parent->parent)
>>> parent->parent->child = tmp;
>>> + /*
>>> + * Transfer SD_PREFER_SIBLING down in case of a
>>> + * degenerate parent; the spans match for this
>>> + * so the property transfers.
>>> + */
>>> + if (parent->flags & SD_PREFER_SIBLING)
>>> + tmp->flags |= SD_PREFER_SIBLING;
>>> destroy_sched_domain(parent, cpu);
>>> } else
>>> tmp = tmp->parent;
>>>
>>
>> Reviewed-by: Paul Turner <[email protected]>
>
> BTW, did that comment make sense to you or would you suggest something
> different? I had/am having a hard time with that comment. Somehow it
> leaves me wanting. I know I understand the issue now, but I'll doubt the
> comment will suffice in a years time :/

The comment made sense to me :)

--
All rights reversed.

2013-08-27 09:05:40

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH 09/10] sched, fair: Fix the sd_parent_degenerate() code

On Mon, Aug 26, 2013 at 2:49 PM, Rik van Riel <[email protected]> wrote:
> On 08/26/2013 08:09 AM, Peter Zijlstra wrote:
>> On Sat, Aug 24, 2013 at 03:45:57AM -0700, Paul Turner wrote:
>>>> @@ -5157,6 +5158,13 @@ cpu_attach_domain(struct sched_domain *s
>>>> tmp->parent = parent->parent;
>>>> if (parent->parent)
>>>> parent->parent->child = tmp;
>>>> + /*
>>>> + * Transfer SD_PREFER_SIBLING down in case of a
>>>> + * degenerate parent; the spans match for this
>>>> + * so the property transfers.
>>>> + */
>>>> + if (parent->flags & SD_PREFER_SIBLING)
>>>> + tmp->flags |= SD_PREFER_SIBLING;
>>>> destroy_sched_domain(parent, cpu);
>>>> } else
>>>> tmp = tmp->parent;
>>>>
>>>
>>> Reviewed-by: Paul Turner <[email protected]>
>>
>> BTW, did that comment make sense to you or would you suggest something
>> different? I had/am having a hard time with that comment. Somehow it
>> leaves me wanting. I know I understand the issue now, but I'll doubt the
>> comment will suffice in a years time :/
>
> The comment made sense to me :)

It makes sense once you read the code. That we push down is somehow
counter intuitive in the reading.

>
> --
> All rights reversed.