2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
exposed yet another miscalculation in calc_cfs_shares: MIN_SHARES is unscaled,
and must thus be scaled before being manipulated against "shares" amounts.
Signed-off-by: Samuel Thibault <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
Fixes: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
---
This should be backported to 4.7 and 4.8 to fix scheduling priorities
miscalculations
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6559d19..be84f72 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2657,8 +2657,8 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
if (tg_weight)
shares /= tg_weight;
- if (shares < MIN_SHARES)
- shares = MIN_SHARES;
+ if (shares < scale_load(MIN_SHARES))
+ shares = scale_load(MIN_SHARES);
if (shares > tg->shares)
shares = tg->shares;
On Mon, Dec 19, 2016 at 2:40 PM, Samuel Thibault
<[email protected]> wrote:
> 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
>
> exposed yet another miscalculation in calc_cfs_shares: MIN_SHARES is unscaled,
> and must thus be scaled before being manipulated against "shares" amounts.
It's actually intentional that MIN_SHARES is un-scaled here, this is
necessary to support the goal of sub-partitioning groups with small
shares.
E.g. A group with shares=2 and 5 threads will internally provide 2048
units of weight for the load-balancer to account for their
distribution.
>
> Signed-off-by: Samuel Thibault <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: [email protected]
> Fixes: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
>
> ---
> This should be backported to 4.7 and 4.8 to fix scheduling priorities
> miscalculations
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6559d19..be84f72 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2657,8 +2657,8 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
> if (tg_weight)
> shares /= tg_weight;
>
> - if (shares < MIN_SHARES)
> - shares = MIN_SHARES;
> + if (shares < scale_load(MIN_SHARES))
> + shares = scale_load(MIN_SHARES);
> if (shares > tg->shares)
> shares = tg->shares;
>
Paul Turner, on Mon 19 Dec 2016 14:44:38 -0800, wrote:
> On Mon, Dec 19, 2016 at 2:40 PM, Samuel Thibault
> <[email protected]> wrote:
> > 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
> >
> > exposed yet another miscalculation in calc_cfs_shares: MIN_SHARES is unscaled,
> > and must thus be scaled before being manipulated against "shares" amounts.
>
> It's actually intentional that MIN_SHARES is un-scaled here, this is
> necessary to support the goal of sub-partitioning groups with small
> shares.
Uh? you mean it's normal that MIN_SHARES is here compared as such
against "shares" while e.g. in sched_group_set_shares or effective_load
it is scaled before comparing with "shares"?
> E.g. A group with shares=2 and 5 threads will internally provide 2048
> units of weight for the load-balancer to account for their
> distribution.
But here "shares" is already scaled, so
> > - if (shares < MIN_SHARES)
> > - shares = MIN_SHARES;
...
> > return shares;
This will only make sure that the returned shares is 2, not 2048.
Samuel
On Mon, Dec 19, 2016 at 3:07 PM, Samuel Thibault
<[email protected]> wrote:
> Paul Turner, on Mon 19 Dec 2016 14:44:38 -0800, wrote:
>> On Mon, Dec 19, 2016 at 2:40 PM, Samuel Thibault
>> <[email protected]> wrote:
>> > 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
>> >
>> > exposed yet another miscalculation in calc_cfs_shares: MIN_SHARES is unscaled,
>> > and must thus be scaled before being manipulated against "shares" amounts.
>>
>> It's actually intentional that MIN_SHARES is un-scaled here, this is
>> necessary to support the goal of sub-partitioning groups with small
>> shares.
>
> Uh? you mean it's normal that MIN_SHARES is here compared as such
> against "shares" while e.g. in sched_group_set_shares or effective_load
> it is scaled before comparing with "shares"?
Yes.
sched_group_set_shares() controls the amount allocated to the group.
Both calc_cfs_shares() and effective_load() are subdividing this
total. Which is why it is scaled up from the external value of 2.
>
>> E.g. A group with shares=2 and 5 threads will internally provide 2048
>> units of weight for the load-balancer to account for their
>> distribution.
>
> But here "shares" is already scaled, so
>
>> > - if (shares < MIN_SHARES)
>> > - shares = MIN_SHARES;
> ...
>> > return shares;
>
> This will only make sure that the returned shares is 2, not 2048.
This is intentional. The MIN_SHARES you are seeing here is overloaded.
Every "1" unit of share is "SCHED_LOAD_RESOLUTION" bits internally.
We express a minimum of "2" in terms of the unit weight due to larger
numerical errors in the "1" case.
In the unscaled case this needs to be MIN_SHARES, and in the scaled
case, the subdivision of the scaled values must still be >=2.
To make this concrete:
In this case we can then internally say that there are (internally)
~410 units of weight for each of these 5 threads.
Thus, if one cpu has 4 threads and another 1, we see that as a
1640/410 imbalance, not a 2048/2048 balance.
>
> Samuel
Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
> >> > - if (shares < MIN_SHARES)
> >> > - shares = MIN_SHARES;
> > ...
> >> > return shares;
> >
> > This will only make sure that the returned shares is 2, not 2048.
>
> This is intentional. The MIN_SHARES you are seeing here is overloaded.
> Every "1" unit of share is "SCHED_LOAD_RESOLUTION" bits internally.
I'm not talking about the SCHED_LOAD_RESOLUTION scaling, but about the
SCHED_FIXEDPOINT_SHIFT scaling, which is what
2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
modified on 64bit platforms.
Samuel
On Mon, Dec 19, 2016 at 3:29 PM, Samuel Thibault
<[email protected]> wrote:
> Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
>> >> > - if (shares < MIN_SHARES)
>> >> > - shares = MIN_SHARES;
>> > ...
>> >> > return shares;
>> >
>> > This will only make sure that the returned shares is 2, not 2048.
>>
>> This is intentional. The MIN_SHARES you are seeing here is overloaded.
>> Every "1" unit of share is "SCHED_LOAD_RESOLUTION" bits internally.
>
> I'm not talking about the SCHED_LOAD_RESOLUTION scaling, but about the
> SCHED_FIXEDPOINT_SHIFT scaling, which is what
> 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
> modified on 64bit platforms.
.... From that commit:
"""
-#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power
usage under light load */
+#ifdef CONFIG_64BIT
# define SCHED_LOAD_RESOLUTION 10
# define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
"""
Please take a deeper look at the scale_load() interactions.
Paul Turner, on Mon 19 Dec 2016 15:32:15 -0800, wrote:
> On Mon, Dec 19, 2016 at 3:29 PM, Samuel Thibault
> <[email protected]> wrote:
> > Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
> >> >> > - if (shares < MIN_SHARES)
> >> >> > - shares = MIN_SHARES;
> >> > ...
> >> >> > return shares;
> >> >
> >> > This will only make sure that the returned shares is 2, not 2048.
> >>
> >> This is intentional. The MIN_SHARES you are seeing here is overloaded.
> >> Every "1" unit of share is "SCHED_LOAD_RESOLUTION" bits internally.
> >
> > I'm not talking about the SCHED_LOAD_RESOLUTION scaling, but about the
> > SCHED_FIXEDPOINT_SHIFT scaling, which is what
> > 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
> > modified on 64bit platforms.
>
> .... From that commit:
>
> """
> -#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power
> usage under light load */
> +#ifdef CONFIG_64BIT
> # define SCHED_LOAD_RESOLUTION 10
> # define scale_load(w) ((w) << SCHED_LOAD_RESOLUTION)
> # define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
Errgl, sorry, I was referring to the old naming. This stuff has seen
so much patching over and over in the past revisions... It though you
were referring to SCHED_CAPACITY_SCALE. The code I was reading now uses
SCHED_LOAD_RESOLUTION, so that's why I read your "SCHED_LOAD_RESOLUTION"
as "the other scaling".
> The MIN_SHARES you are seeing here is overloaded.
> In the unscaled case this needs to be MIN_SHARES, and in the scaled
> case, the subdivision of the scaled values must still be >=2.
Ok, now I understand. I have to say this overloading is confusing.
Samuel
Hi Samuel,
On 12/20/2016 12:45 AM, Samuel Thibault wrote:
> Paul Turner, on Mon 19 Dec 2016 15:32:15 -0800, wrote:
>> On Mon, Dec 19, 2016 at 3:29 PM, Samuel Thibault
>> <[email protected]> wrote:
>>> Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
[...]
>> The MIN_SHARES you are seeing here is overloaded.
>> In the unscaled case this needs to be MIN_SHARES, and in the scaled
>> case, the subdivision of the scaled values must still be >=2.
>
> Ok, now I understand. I have to say this overloading is confusing.
>
> Samuel
this had been already discussed back in August when I posted the
original patch.
https://lkml.org/lkml/2016/8/22/351
https://lkml.org/lkml/2016/8/23/641
On Tue, Dec 20, 2016 at 02:04:34PM +0100, Dietmar Eggemann wrote:
> Hi Samuel,
>
> On 12/20/2016 12:45 AM, Samuel Thibault wrote:
> >Paul Turner, on Mon 19 Dec 2016 15:32:15 -0800, wrote:
> >>On Mon, Dec 19, 2016 at 3:29 PM, Samuel Thibault
> >><[email protected]> wrote:
> >>>Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
>
> [...]
>
> >>The MIN_SHARES you are seeing here is overloaded.
> >>In the unscaled case this needs to be MIN_SHARES, and in the scaled
> >>case, the subdivision of the scaled values must still be >=2.
> >
> >Ok, now I understand. I have to say this overloading is confusing.
> >
> >Samuel
>
> this had been already discussed back in August when I posted the original
> patch.
Maybe we should put a comment in to avoid getting more of these ;-)
Dietmar Eggemann, on Tue 20 Dec 2016 14:04:34 +0100, wrote:
> On 12/20/2016 12:45 AM, Samuel Thibault wrote:
> >Paul Turner, on Mon 19 Dec 2016 15:32:15 -0800, wrote:
> >>On Mon, Dec 19, 2016 at 3:29 PM, Samuel Thibault
> >><[email protected]> wrote:
> >>>Paul Turner, on Mon 19 Dec 2016 15:26:19 -0800, wrote:
>
> [...]
>
> >>The MIN_SHARES you are seeing here is overloaded.
> >>In the unscaled case this needs to be MIN_SHARES, and in the scaled
> >>case, the subdivision of the scaled values must still be >=2.
> >
> >Ok, now I understand. I have to say this overloading is confusing.
> >
> >Samuel
>
> this had been already discussed back in August when I posted the original
> patch.
But that doesn't show up in the source code or git history. One
shouldn't have to dig mailing lists to get code comments :)
Samuel