2009-10-09 09:51:42

by Yanmin Zhang

[permalink] [raw]
Subject: tbench regression with 2.6.32-rc1

Comparing with 2.6.31's results, tebench has some regression with
2.6.32-rc1.
COmmandline to start tbench:
#./tbench_srv &
#./tbench -t 600 CPU_NUM*2 127.0.0.1 #Use real cpu num to replace CPU_NUM
So start 2 client processes per cpu.

1) On 4*4 core tigerton: 30%;
2) On 2*4 core stoakley: 15%;
3) On 2*8 core Nehalem: 6%.

As there are couple of patches which try to turn on/off some sched domain
flags such like SD_BALANCE_WAKE, I used some walkaround to bisect it.
On tigerton, below patch is captured.
commit 59abf02644c45f1591e1374ee7bb45dc757fcb88
Author: Peter Zijlstra <[email protected]>
Date: Wed Sep 16 08:28:30 2009 +0200

sched: Add SD_PREFER_LOCAL


The patch reverting is not clean, so I did some testing by turning on/off
some domain flags and sched_feaures manually.

1) On tigerton: if SD_PREFER_LOCAL=0 (disable it), the regression becomes about 2%.
2) On stoakley: if SD_PREFER_LOCAL=0 (disable it), the regression becomes about 4%.
3) On Nehalem: Above method couldn't improve result. I'm still checking it.

I also tried to turn on/off FAIR_SLEEPERS and GENTLE_FAIR_SLEEPERS. It seems they
has limited impact on tbench. I need double check these 2 flags.


2009-10-09 10:17:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: tbench regression with 2.6.32-rc1

On Fri, 2009-10-09 at 17:51 +0800, Zhang, Yanmin wrote:
> Comparing with 2.6.31's results, tebench has some regression with
> 2.6.32-rc1.
> COmmandline to start tbench:
> #./tbench_srv &
> #./tbench -t 600 CPU_NUM*2 127.0.0.1 #Use real cpu num to replace CPU_NUM
> So start 2 client processes per cpu.
>
> 1) On 4*4 core tigerton: 30%;
> 2) On 2*4 core stoakley: 15%;
> 3) On 2*8 core Nehalem: 6%.
>
> As there are couple of patches which try to turn on/off some sched domain
> flags such like SD_BALANCE_WAKE, I used some walkaround to bisect it.
> On tigerton, below patch is captured.
> commit 59abf02644c45f1591e1374ee7bb45dc757fcb88
> Author: Peter Zijlstra <[email protected]>
> Date: Wed Sep 16 08:28:30 2009 +0200
>
> sched: Add SD_PREFER_LOCAL
>
>
> The patch reverting is not clean, so I did some testing by turning on/off
> some domain flags and sched_feaures manually.
>
> 1) On tigerton: if SD_PREFER_LOCAL=0 (disable it), the regression becomes about 2%.
> 2) On stoakley: if SD_PREFER_LOCAL=0 (disable it), the regression becomes about 4%.
> 3) On Nehalem: Above method couldn't improve result. I'm still checking it.
>
> I also tried to turn on/off FAIR_SLEEPERS and GENTLE_FAIR_SLEEPERS. It seems they
> has limited impact on tbench. I need double check these 2 flags.

So the c2q cpus, and esp the one with smaller cache hurt from this. I
guess we can turn this off without too much down sides. Maybe turn it on
for NUMA on the nehalem?


---
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 25a9284..d823c24 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,6 +143,7 @@ extern unsigned long node_remap_size[];
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
+ | 1*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
diff --git a/include/linux/topology.h b/include/linux/topology.h
index fc0bf3e..57e6357 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -129,7 +129,7 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
- | 1*SD_PREFER_LOCAL \
+ | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 1*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \
@@ -162,7 +162,7 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
- | 1*SD_PREFER_LOCAL \
+ | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \

2009-10-12 05:20:36

by Yanmin Zhang

[permalink] [raw]
Subject: Re: tbench regression with 2.6.32-rc1

On Fri, 2009-10-09 at 12:16 +0200, Peter Zijlstra wrote:
> On Fri, 2009-10-09 at 17:51 +0800, Zhang, Yanmin wrote:
> > Comparing with 2.6.31's results, tebench has some regression with
> > 2.6.32-rc1.
> > COmmandline to start tbench:
> > #./tbench_srv &
> > #./tbench -t 600 CPU_NUM*2 127.0.0.1 #Use real cpu num to replace CPU_NUM
> > So start 2 client processes per cpu.
> >
> > 1) On 4*4 core tigerton: 30%;
> > 2) On 2*4 core stoakley: 15%;
> > 3) On 2*8 core Nehalem: 6%.
> >
> > As there are couple of patches which try to turn on/off some sched domain
> > flags such like SD_BALANCE_WAKE, I used some walkaround to bisect it.
> > On tigerton, below patch is captured.
> > commit 59abf02644c45f1591e1374ee7bb45dc757fcb88
> > Author: Peter Zijlstra <[email protected]>
> > Date: Wed Sep 16 08:28:30 2009 +0200
> >
> > sched: Add SD_PREFER_LOCAL
> >
> >
> > The patch reverting is not clean, so I did some testing by turning on/off
> > some domain flags and sched_feaures manually.
> >
> > 1) On tigerton: if SD_PREFER_LOCAL=0 (disable it), the regression becomes about 2%.
> > 2) On stoakley: if SD_PREFER_LOCAL=0 (disable it), the regression becomes about 4%.
> > 3) On Nehalem: Above method couldn't improve result. I'm still checking it.
> >
> > I also tried to turn on/off FAIR_SLEEPERS and GENTLE_FAIR_SLEEPERS. It seems they
> > has limited impact on tbench. I need double check these 2 flags.
>
> So the c2q cpus, and esp the one with smaller cache hurt from this. I
> guess we can turn this off without too much down sides. Maybe turn it on
> for NUMA on the nehalem?
I tested the patch and it does work like turning it off from domain flags.
So with the patch, stoakley still has 4% regression and tigerton has 2%.

>
>
> ---
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index 25a9284..d823c24 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -143,6 +143,7 @@ extern unsigned long node_remap_size[];
> | 1*SD_BALANCE_FORK \
> | 0*SD_BALANCE_WAKE \
> | 1*SD_WAKE_AFFINE \
> + | 1*SD_PREFER_LOCAL \
> | 0*SD_SHARE_CPUPOWER \
> | 0*SD_POWERSAVINGS_BALANCE \
> | 0*SD_SHARE_PKG_RESOURCES \
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index fc0bf3e..57e6357 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -129,7 +129,7 @@ int arch_update_cpu_topology(void);
> | 1*SD_BALANCE_FORK \
> | 0*SD_BALANCE_WAKE \
> | 1*SD_WAKE_AFFINE \
> - | 1*SD_PREFER_LOCAL \
> + | 0*SD_PREFER_LOCAL \
> | 0*SD_SHARE_CPUPOWER \
> | 1*SD_SHARE_PKG_RESOURCES \
> | 0*SD_SERIALIZE \
> @@ -162,7 +162,7 @@ int arch_update_cpu_topology(void);
> | 1*SD_BALANCE_FORK \
> | 0*SD_BALANCE_WAKE \
> | 1*SD_WAKE_AFFINE \
> - | 1*SD_PREFER_LOCAL \
> + | 0*SD_PREFER_LOCAL \
> | 0*SD_SHARE_CPUPOWER \
> | 0*SD_SHARE_PKG_RESOURCES \
> | 0*SD_SERIALIZE \
>
>

2009-10-14 13:14:48

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:sched/urgent] sched: Disable SD_PREFER_LOCAL for MC/CPU domains

Commit-ID: 799e2205ec65e174f752b558c62a92c4752df313
Gitweb: http://git.kernel.org/tip/799e2205ec65e174f752b558c62a92c4752df313
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 9 Oct 2009 12:16:40 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 14 Oct 2009 15:02:34 +0200

sched: Disable SD_PREFER_LOCAL for MC/CPU domains

Yanmin reported that both tbench and hackbench were significantly
hurt by trying to keep tasks local on these domains, esp on small
cache machines.

So disable it in order to promote spreading outside of the cache
domains.

Reported-by: "Zhang, Yanmin" <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
CC: Mike Galbraith <[email protected]>
LKML-Reference: <1255083400.8802.15.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/topology.h | 1 +
include/linux/topology.h | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 25a9284..d823c24 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,6 +143,7 @@ extern unsigned long node_remap_size[];
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
+ | 1*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_POWERSAVINGS_BALANCE \
| 0*SD_SHARE_PKG_RESOURCES \
diff --git a/include/linux/topology.h b/include/linux/topology.h
index fc0bf3e..57e6357 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -129,7 +129,7 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
- | 1*SD_PREFER_LOCAL \
+ | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 1*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \
@@ -162,7 +162,7 @@ int arch_update_cpu_topology(void);
| 1*SD_BALANCE_FORK \
| 0*SD_BALANCE_WAKE \
| 1*SD_WAKE_AFFINE \
- | 1*SD_PREFER_LOCAL \
+ | 0*SD_PREFER_LOCAL \
| 0*SD_SHARE_CPUPOWER \
| 0*SD_SHARE_PKG_RESOURCES \
| 0*SD_SERIALIZE \