2009-01-13 08:57:29

by Yanmin Zhang

[permalink] [raw]
Subject: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

Comparing with 2.6.28's results, specjbb2005 has about 7% regression with 2.6.29-rc1
on my a couple of x86_64 machines. aim7 has about 1.7% regression.

Ming did a quick bisect with aim7 and located below patch.

commit 0a582440ff546e2c6610d1acec325e91b4efd313
Author: Mike Galbraith <[email protected]>
Date: Fri Jan 2 12:16:42 2009 +0100

sched: fix sched_slice()

Impact: fix bad-interactivity buglet

Fix sched_slice() to emit a sane result whether a task is currently
enqueued or not.

Signed-off-by: Mike Galbraith <[email protected]>
Tested-by: Jayson King <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


After we revert the patch, aim7 regression disappeared. specjbb2005 regression becomes
less than 1.5% on 8-core stokley and disappears on 16-core tigerton. I don't know what
causes the last 1.5% regression.

As tbench has about 5% improvement and oltp(mysql+sysbench) has 5% improvement, we also tested
to make sure such improvement isn't related to above patch. volanoMark's improvement is also not
related to the patch. So it seems safe to revert it.

yanmin



2009-01-13 09:37:59

by Mike Galbraith

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

On Tue, 2009-01-13 at 16:57 +0800, Zhang, Yanmin wrote:
> Comparing with 2.6.28's results, specjbb2005 has about 7% regression with 2.6.29-rc1
> on my a couple of x86_64 machines. aim7 has about 1.7% regression.
>
> Ming did a quick bisect with aim7 and located below patch.
>
> commit 0a582440ff546e2c6610d1acec325e91b4efd313
> Author: Mike Galbraith <[email protected]>
> Date: Fri Jan 2 12:16:42 2009 +0100
>
> sched: fix sched_slice()
>
> Impact: fix bad-interactivity buglet
>
> Fix sched_slice() to emit a sane result whether a task is currently
> enqueued or not.
>
> Signed-off-by: Mike Galbraith <[email protected]>
> Tested-by: Jayson King <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
>
>
> After we revert the patch, aim7 regression disappeared. specjbb2005 regression becomes
> less than 1.5% on 8-core stokley and disappears on 16-core tigerton. I don't know what
> causes the last 1.5% regression.
>
> As tbench has about 5% improvement and oltp(mysql+sysbench) has 5% improvement, we also tested
> to make sure such improvement isn't related to above patch. volanoMark's improvement is also not
> related to the patch. So it seems safe to revert it.

No, it's not safe to just revert. You can replace it with something
else, but as long as sched_slice() is called for unqueued tasks, it must
emit sane slices, otherwise you can experience a latency-hit-from-hell.

See thread: problem with "sched: revert back to per-rq vruntime"?

-Mike

2009-01-15 02:33:31

by Lin Ming

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

On Tue, 2009-01-13 at 17:37 +0800, Mike Galbraith wrote:
> On Tue, 2009-01-13 at 16:57 +0800, Zhang, Yanmin wrote:
> > Comparing with 2.6.28's results, specjbb2005 has about 7% regression with 2.6.29-rc1
> > on my a couple of x86_64 machines. aim7 has about 1.7% regression.
> >
> > Ming did a quick bisect with aim7 and located below patch.
> >
> > commit 0a582440ff546e2c6610d1acec325e91b4efd313
> > Author: Mike Galbraith <[email protected]>
> > Date: Fri Jan 2 12:16:42 2009 +0100
> >
> > sched: fix sched_slice()
> >
> > Impact: fix bad-interactivity buglet
> >
> > Fix sched_slice() to emit a sane result whether a task is currently
> > enqueued or not.
> >
> > Signed-off-by: Mike Galbraith <[email protected]>
> > Tested-by: Jayson King <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> >
> > After we revert the patch, aim7 regression disappeared. specjbb2005 regression becomes
> > less than 1.5% on 8-core stokley and disappears on 16-core tigerton. I don't know what
> > causes the last 1.5% regression.
> >
> > As tbench has about 5% improvement and oltp(mysql+sysbench) has 5% improvement, we also tested
> > to make sure such improvement isn't related to above patch. volanoMark's improvement is also not
> > related to the patch. So it seems safe to revert it.
>
> No, it's not safe to just revert. You can replace it with something
> else, but as long as sched_slice() is called for unqueued tasks, it must
> emit sane slices, otherwise you can experience a latency-hit-from-hell.
>
> See thread: problem with "sched: revert back to per-rq vruntime"?

Below patch fixes aim7 regression and specjbb2005 regression becomes
less than 1.5% on 8-core stokley.

Jayson,
Mike's patch fixed the latency problem you reported
would you please help to test this patch to see if it still works fine now?

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 8e1352c..617e54c 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -429,10 +429,10 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);

for_each_sched_entity(se) {
- struct load_weight *load = &cfs_rq->load;
+ struct load_weight *load = &cfs_rq_of(se)->load;

if (unlikely(!se->on_rq)) {
- struct load_weight lw = cfs_rq->load;
+ struct load_weight lw = cfs_rq_of(se)->load;

update_load_add(&lw, se->load.weight);
load = &lw;


Thanks,
Lin Ming


>
> -Mike
>

2009-01-15 07:36:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

On Thu, 2009-01-15 at 10:30 +0800, Lin Ming wrote:
> On Tue, 2009-01-13 at 17:37 +0800, Mike Galbraith wrote:
> > On Tue, 2009-01-13 at 16:57 +0800, Zhang, Yanmin wrote:
> > > Comparing with 2.6.28's results, specjbb2005 has about 7% regression with 2.6.29-rc1
> > > on my a couple of x86_64 machines. aim7 has about 1.7% regression.
> > >
> > > Ming did a quick bisect with aim7 and located below patch.
> > >
> > > commit 0a582440ff546e2c6610d1acec325e91b4efd313
> > > Author: Mike Galbraith <[email protected]>
> > > Date: Fri Jan 2 12:16:42 2009 +0100
> > >
> > > sched: fix sched_slice()
> > >
> > > Impact: fix bad-interactivity buglet
> > >
> > > Fix sched_slice() to emit a sane result whether a task is currently
> > > enqueued or not.
> > >
> > > Signed-off-by: Mike Galbraith <[email protected]>
> > > Tested-by: Jayson King <[email protected]>
> > > Signed-off-by: Ingo Molnar <[email protected]>
> > >
> > >
> > > After we revert the patch, aim7 regression disappeared. specjbb2005 regression becomes
> > > less than 1.5% on 8-core stokley and disappears on 16-core tigerton. I don't know what
> > > causes the last 1.5% regression.
> > >
> > > As tbench has about 5% improvement and oltp(mysql+sysbench) has 5% improvement, we also tested
> > > to make sure such improvement isn't related to above patch. volanoMark's improvement is also not
> > > related to the patch. So it seems safe to revert it.
> >
> > No, it's not safe to just revert. You can replace it with something
> > else, but as long as sched_slice() is called for unqueued tasks, it must
> > emit sane slices, otherwise you can experience a latency-hit-from-hell.
> >
> > See thread: problem with "sched: revert back to per-rq vruntime"?
>
> Below patch fixes aim7 regression and specjbb2005 regression becomes
> less than 1.5% on 8-core stokley.
>
> Jayson,
> Mike's patch fixed the latency problem you reported
> would you please help to test this patch to see if it still works fine now?
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 8e1352c..617e54c 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -429,10 +429,10 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
>
> for_each_sched_entity(se) {
> - struct load_weight *load = &cfs_rq->load;
> + struct load_weight *load = &cfs_rq_of(se)->load;
>
> if (unlikely(!se->on_rq)) {
> - struct load_weight lw = cfs_rq->load;
> + struct load_weight lw = cfs_rq_of(se)->load;
>
> update_load_add(&lw, se->load.weight);
> load = &lw;
>

Ugh, that's not making sense, the thing is, if !se->on_rq it doesn't yet
have a sensible cfs_rq_of().


2009-01-15 15:11:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

On Thu, 2009-01-15 at 10:30 +0800, Lin Ming wrote:

> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 8e1352c..617e54c 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -429,10 +429,10 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
>
> for_each_sched_entity(se) {
> - struct load_weight *load = &cfs_rq->load;
> + struct load_weight *load = &cfs_rq_of(se)->load;
>
> if (unlikely(!se->on_rq)) {
> - struct load_weight lw = cfs_rq->load;
> + struct load_weight lw = cfs_rq_of(se)->load;
>
> update_load_add(&lw, se->load.weight);
> load = &lw;
>

I was wrong, this does look good.

Thanks

2009-01-15 16:17:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

On Thu, 2009-01-15 at 10:30 +0800, Lin Ming wrote:

> Below patch fixes aim7 regression and specjbb2005 regression becomes
> less than 1.5% on 8-core stokley.

Ingo, please apply the below.

Ming, would you also provide a S-o-b line?

---
Subject: sched_slice fixlet
From: Lin Ming <[email protected]>
Date: Thu Jan 15 17:10:02 CET 2009

Mike's change: 0a582440f -- sched: fix sched_slice())
Broke group scheduling by forgetting to reload cfs_rq on each loop.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched_fair.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -429,7 +429,10 @@ static u64 sched_slice(struct cfs_rq *cf
u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);

for_each_sched_entity(se) {
- struct load_weight *load = &cfs_rq->load;
+ struct load_weight *load;
+
+ cfs_rq = cfs_rq_of(se);
+ load = &cfs_rq->load;

if (unlikely(!se->on_rq)) {
struct load_weight lw = cfs_rq->load;

2009-01-15 17:55:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1


* Peter Zijlstra <[email protected]> wrote:

> On Thu, 2009-01-15 at 10:30 +0800, Lin Ming wrote:
>
> > Below patch fixes aim7 regression and specjbb2005 regression becomes
> > less than 1.5% on 8-core stokley.
>
> Ingo, please apply the below.

Applied to tip/sched/urgent, thanks!

Ingo

2009-01-15 18:53:41

by Jayson R. King

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

Peter Zijlstra wrote:
> On Thu, 2009-01-15 at 10:30 +0800, Lin Ming wrote:
>
>> Below patch fixes aim7 regression and specjbb2005 regression becomes
>> less than 1.5% on 8-core stokley.
>
> Ingo, please apply the below.
>
> Ming, would you also provide a S-o-b line?
>
> ---
> Subject: sched_slice fixlet
> From: Lin Ming <[email protected]>
> Date: Thu Jan 15 17:10:02 CET 2009
>
> Mike's change: 0a582440f -- sched: fix sched_slice())
> Broke group scheduling by forgetting to reload cfs_rq on each loop.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched_fair.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_fair.c
> +++ linux-2.6/kernel/sched_fair.c
> @@ -429,7 +429,10 @@ static u64 sched_slice(struct cfs_rq *cf
> u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
>
> for_each_sched_entity(se) {
> - struct load_weight *load = &cfs_rq->load;
> + struct load_weight *load;
> +
> + cfs_rq = cfs_rq_of(se);
> + load = &cfs_rq->load;
>
> if (unlikely(!se->on_rq)) {
> struct load_weight lw = cfs_rq->load;
>
>
>


That still works for me. You may add:

Tested-by: Jayson King <[email protected]>


2009-01-15 20:58:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1


* Jayson King <[email protected]> wrote:

>> - struct load_weight *load = &cfs_rq->load;
>> + struct load_weight *load;
>> +
>> + cfs_rq = cfs_rq_of(se);
>> + load = &cfs_rq->load;
>> if (unlikely(!se->on_rq)) {
>> struct load_weight lw = cfs_rq->load;
>>
>>
>>
>
>
> That still works for me. You may add:
>
> Tested-by: Jayson King <[email protected]>

I have added it to the commit - thanks Jayson for the testing,

Ingo

2009-01-16 00:52:39

by Lin Ming

[permalink] [raw]
Subject: Re: Performance regression of specjbb2005/aim7 with 2.6.29-rc1

On Fri, 2009-01-16 at 00:17 +0800, Peter Zijlstra wrote:
> On Thu, 2009-01-15 at 10:30 +0800, Lin Ming wrote:
>
> > Below patch fixes aim7 regression and specjbb2005 regression becomes
> > less than 1.5% on 8-core stokley.
>
> Ingo, please apply the below.
>
> Ming, would you also provide a S-o-b line?

Yes,

Signed-off-by: Lin Ming <[email protected]>

Lin Ming

>
> ---
> Subject: sched_slice fixlet
> From: Lin Ming <[email protected]>
> Date: Thu Jan 15 17:10:02 CET 2009
>
> Mike's change: 0a582440f -- sched: fix sched_slice())
> Broke group scheduling by forgetting to reload cfs_rq on each loop.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched_fair.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/kernel/sched_fair.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched_fair.c
> +++ linux-2.6/kernel/sched_fair.c
> @@ -429,7 +429,10 @@ static u64 sched_slice(struct cfs_rq *cf
> u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
>
> for_each_sched_entity(se) {
> - struct load_weight *load = &cfs_rq->load;
> + struct load_weight *load;
> +
> + cfs_rq = cfs_rq_of(se);
> + load = &cfs_rq->load;
>
> if (unlikely(!se->on_rq)) {
> struct load_weight lw = cfs_rq->load;
>
>