2014-01-06 11:39:19

by Mel Gorman

[permalink] [raw]
Subject: [PATCH] sched: Calculate effective load even if local weight is 0

(Rik, you authored this patch so it should be sent from you and needs a
signed-off assuming people are ok with the changelog.)

Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 (sched/numa: Avoid overloading CPUs on a preferred NUMA
node) as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 (sched/numa: Avoid overloading CPUs
on a preferred NUMA node) changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.

[[email protected]: Wrote changelog]
Reported-and-tested-by: Thomas Hellstrom <[email protected]>
Should-be-signed-off-and-authored-by-Rik
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..e64b079 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3923,7 +3923,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
{
struct sched_entity *se = tg->se[cpu];

- if (!tg->parent || !wl) /* the trivial, non-cgroup case */
+ if (!tg->parent) /* the trivial, non-cgroup case */
return wl;

for_each_sched_entity(se) {


2014-01-06 12:34:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Calculate effective load even if local weight is 0

On Mon, Jan 06, 2014 at 11:39:12AM +0000, Mel Gorman wrote:
> (Rik, you authored this patch so it should be sent from you and needs a
> signed-off assuming people are ok with the changelog.)

Where did all this take place? I saw the initial email I think but then
nothing..

Not that I was reading much email the past two weeks anyway :-)

2014-01-06 12:35:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Calculate effective load even if local weight is 0

On Mon, Jan 06, 2014 at 11:39:12AM +0000, Mel Gorman wrote:
> wake_affine uses a weight of 0 for sync wakeups on the grounds that it
> is assuming the waking task will sleep and not contribute to load in the
> near future. In this case, we still want to calculate the effective load
> of the sched entity hierarchy. As effective_load is no longer used by
> task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
> search to find swap/migration candidates), this patch simply restores the
> historical behaviour.

Urgh.. yeah. We should merge this. Rik you ok with me adding a from and
sob from you?

2014-01-06 13:09:17

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] sched: Calculate effective load even if local weight is 0

On Mon, Jan 06, 2014 at 01:34:20PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 06, 2014 at 11:39:12AM +0000, Mel Gorman wrote:
> > (Rik, you authored this patch so it should be sent from you and needs a
> > signed-off assuming people are ok with the changelog.)
>
> Where did all this take place? I saw the initial email I think but then
> nothing..
>
> Not that I was reading much email the past two weeks anyway :-)

Much of the discussion was on the bugzilla entry
https://bugzilla.kernel.org/show_bug.cgi?id=67601 which is rare but it
happens. In retrospect, I should also have included a blurb saying that
the patch resolved the bug but it's not that big a deal.

--
Mel Gorman
SUSE Labs

2014-01-06 14:01:22

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] sched: Calculate effective load even if local weight is 0

On 01/06/2014 07:35 AM, Peter Zijlstra wrote:
> On Mon, Jan 06, 2014 at 11:39:12AM +0000, Mel Gorman wrote:
>> wake_affine uses a weight of 0 for sync wakeups on the grounds that it
>> is assuming the waking task will sleep and not contribute to load in the
>> near future. In this case, we still want to calculate the effective load
>> of the sched entity hierarchy. As effective_load is no longer used by
>> task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
>> search to find swap/migration candidates), this patch simply restores the
>> historical behaviour.
>
> Urgh.. yeah. We should merge this. Rik you ok with me adding a from and
> sob from you?

Yes, please do :)

Signed-off-by: Rik van Riel <[email protected]>


--
All rights reversed

Subject: [tip:sched/urgent] sched: Calculate effective load even if local weight is 0

Commit-ID: 9722c2dac708e9468cc0dc30218ef76946ffbc9d
Gitweb: http://git.kernel.org/tip/9722c2dac708e9468cc0dc30218ef76946ffbc9d
Author: Rik van Riel <[email protected]>
AuthorDate: Mon, 6 Jan 2014 11:39:12 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Sun, 12 Jan 2014 09:22:15 +0100

sched: Calculate effective load even if local weight is 0

Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.

Reported-and-tested-by: Thomas Hellstrom <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
[ Wrote changelog]
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..e64b079 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3923,7 +3923,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
{
struct sched_entity *se = tg->se[cpu];

- if (!tg->parent || !wl) /* the trivial, non-cgroup case */
+ if (!tg->parent) /* the trivial, non-cgroup case */
return wl;

for_each_sched_entity(se) {

2014-01-13 07:52:47

by Preeti Murthy

[permalink] [raw]
Subject: Re: [PATCH] sched: Calculate effective load even if local weight is 0

Hi,

On Mon, Jan 6, 2014 at 5:09 PM, Mel Gorman <[email protected]> wrote:
> (Rik, you authored this patch so it should be sent from you and needs a
> signed-off assuming people are ok with the changelog.)
>
> Thomas Hellstrom bisected a regression where erratic 3D performance is
> experienced on virtual machines as measured by glxgears. It identified
> commit 58d081b5 (sched/numa: Avoid overloading CPUs on a preferred NUMA
> node) as the problem which had modified the behaviour of effective_load.
>
> Effective load calculates the difference to the system-wide load if a
> scheduling entity was moved to another CPU. The task group is not heavier
> as a result of the move but overall system load can increase/decrease as a
> result of the change. Commit 58d081b5 (sched/numa: Avoid overloading CPUs
> on a preferred NUMA node) changed effective_load to make it suitable for
> calculating if a particular NUMA node was compute overloaded. To reduce
> the cost of the function, it assumed that a current sched entity weight
> of 0 was uninteresting but that is not the case.
>
> wake_affine uses a weight of 0 for sync wakeups on the grounds that it
> is assuming the waking task will sleep and not contribute to load in the
> near future. In this case, we still want to calculate the effective load
> of the sched entity hierarchy. As effective_load is no longer used by

Would it be worth mentioning that besides sync wakeups, wake_affine() uses a
weight of 0 for the sched entity, for effective load calculation on
the prev_cpu as well?
This is so as to find the effect of moving this task away from the
prev_cpu. Here
too we are interested in calculating the effective load of the root
task group of this
sched entity on the prev_cpu and the below restored check will be relevant.

Without the below check the difference in the loads of the wake affine
CPU and the
prev_cpu can get messed up.

Thanks

Regards
Preeti U Murthy


> task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
> search to find swap/migration candidates), this patch simply restores the
> historical behaviour.
>
> [[email protected]: Wrote changelog]
> Reported-and-tested-by: Thomas Hellstrom <[email protected]>
> Should-be-signed-off-and-authored-by-Rik
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c7395d9..e64b079 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3923,7 +3923,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
> {
> struct sched_entity *se = tg->se[cpu];
>
> - if (!tg->parent || !wl) /* the trivial, non-cgroup case */
> + if (!tg->parent) /* the trivial, non-cgroup case */
> return wl;
>
> for_each_sched_entity(se) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-01-17 14:56:33

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH] sched: Calculate effective load even if local weight is 0

On Mon, Jan 13, 2014 at 01:22:40PM +0530, Preeti Murthy wrote:
> Hi,
>
> On Mon, Jan 6, 2014 at 5:09 PM, Mel Gorman <[email protected]> wrote:
> > (Rik, you authored this patch so it should be sent from you and needs a
> > signed-off assuming people are ok with the changelog.)
> >
> > Thomas Hellstrom bisected a regression where erratic 3D performance is
> > experienced on virtual machines as measured by glxgears. It identified
> > commit 58d081b5 (sched/numa: Avoid overloading CPUs on a preferred NUMA
> > node) as the problem which had modified the behaviour of effective_load.
> >
> > Effective load calculates the difference to the system-wide load if a
> > scheduling entity was moved to another CPU. The task group is not heavier
> > as a result of the move but overall system load can increase/decrease as a
> > result of the change. Commit 58d081b5 (sched/numa: Avoid overloading CPUs
> > on a preferred NUMA node) changed effective_load to make it suitable for
> > calculating if a particular NUMA node was compute overloaded. To reduce
> > the cost of the function, it assumed that a current sched entity weight
> > of 0 was uninteresting but that is not the case.
> >
> > wake_affine uses a weight of 0 for sync wakeups on the grounds that it
> > is assuming the waking task will sleep and not contribute to load in the
> > near future. In this case, we still want to calculate the effective load
> > of the sched entity hierarchy. As effective_load is no longer used by
>
> Would it be worth mentioning that besides sync wakeups, wake_affine() uses a
> weight of 0 for the sched entity, for effective load calculation on
> the prev_cpu as well?
> This is so as to find the effect of moving this task away from the
> prev_cpu. Here
> too we are interested in calculating the effective load of the root
> task group of this
> sched entity on the prev_cpu and the below restored check will be relevant.
>
> Without the below check the difference in the loads of the wake affine
> CPU and the
> prev_cpu can get messed up.
>

I was too slow getting to this mail unfortunately. The patch is already
merged upstream with the changelog as-is.

--
Mel Gorman
SUSE Labs