I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found
patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
caused the issue.
kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another
tigerton machine (4 quad-core processors), SPECjbb2005 has more than -40%
regression. I didn't do a bisect on such benchmark testing, but I suspect
the root cause is like aim7's.
-yanmin
On Fri, 2007-10-26 at 17:43 +0800, Zhang, Yanmin wrote:
> I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
>
> Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found
> patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
> caused the issue.
Bit weird that you point to a merge commit, and not an actual patch. Are
you sure git bisect pointed at this one?
* Zhang, Yanmin <[email protected]> wrote:
> I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
>
> Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect
> and found patch
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
> caused the issue.
weird, that's a commit diff - i.e. it changes no code.
> kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another
> tigerton machine (4 quad-core processors), SPECjbb2005 has more than
> -40% regression. I didn't do a bisect on such benchmark testing, but I
> suspect the root cause is like aim7's.
these two commits might be relevant:
7a6c6bcee029a978f866511d6e41dbc7301fde4c
95dbb421d12fdd9796ed153853daf3679809274f
but a bisection result would be the best info.
Ingo
On Fri, 2007-10-26 at 11:53 +0200, Peter Zijlstra wrote:
> On Fri, 2007-10-26 at 17:43 +0800, Zhang, Yanmin wrote:
> > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
> >
> > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found
> > patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
> > caused the issue.
>
> Bit weird that you point to a merge commit, and not an actual patch. Are
> you sure git bisect pointed at this one?
When I did a bisect, kernel couldn't boot and my testing log showed
it's at b5869ce7f68b233ceb81465a7644be0d9a5f3dbb. So I did a manual
checkout.
#git clone ...
#git pull ...
#git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
Then, compiled kernel and tested it. Then, reversed above patch and recompiled/retested it.
If I ran git log, I could see this tag in the list.
-yanmin
On Fri, 2007-10-26 at 13:23 +0200, Ingo Molnar wrote:
> * Zhang, Yanmin <[email protected]> wrote:
>
> > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
> >
> > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect
> > and found patch
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
> > caused the issue.
>
> weird, that's a commit diff - i.e. it changes no code.
I got the tag from #git log. As for above link, I just added prior http address,
so readers could check the patch by clicking.
>
> > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another
> > tigerton machine (4 quad-core processors), SPECjbb2005 has more than
> > -40% regression. I didn't do a bisect on such benchmark testing, but I
> > suspect the root cause is like aim7's.
>
> these two commits might be relevant:
>
> 7a6c6bcee029a978f866511d6e41dbc7301fde4c
I did a quick testing. This patch has no impact.
> 95dbb421d12fdd9796ed153853daf3679809274f
Above big patch doesn't include this one, which means if I do
'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include
95dbb421d12fdd9796ed153853daf3679809274f.
>
> but a bisection result would be the best info.
I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085.
-yanmin
On Mon, 2007-10-29 at 10:22 +0800, Zhang, Yanmin wrote:
> On Fri, 2007-10-26 at 13:23 +0200, Ingo Molnar wrote:
> > * Zhang, Yanmin <[email protected]> wrote:
> >
> > > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
> > >
> > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect
> > > and found patch
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
> > > caused the issue.
> >
> > weird, that's a commit diff - i.e. it changes no code.
> I got the tag from #git log. As for above link, I just added prior http address,
> so readers could check the patch by clicking.
>
> >
> > > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another
> > > tigerton machine (4 quad-core processors), SPECjbb2005 has more than
> > > -40% regression. I didn't do a bisect on such benchmark testing, but I
> > > suspect the root cause is like aim7's.
> >
> > these two commits might be relevant:
> >
> > 7a6c6bcee029a978f866511d6e41dbc7301fde4c
> I did a quick testing. This patch has no impact.
>
> > 95dbb421d12fdd9796ed153853daf3679809274f
> Above big patch doesn't include this one, which means if I do
> 'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include
> 95dbb421d12fdd9796ed153853daf3679809274f.
>
> >
> > but a bisection result would be the best info.
> I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085.
I ran git bisect with kernel version as the tag. It looks like git will
be crazy sometimes. So I checked ChangeLog and used the number tag to replace
the kernel version and retested it.
It looks like at least 2 patches were responsible for the regression. I'm
doing sub-bisect now.
I could find aim7 regression on all my testing machines although the regression
percentage is different.
Machine regression
8-core stoakley 30%
16-core tigerton 6%
tulsa(dual-core+HT, 16 logical cpu) 20%
-yanmin
On Mon, 2007-10-29 at 17:37 +0800, Zhang, Yanmin wrote:
> On Mon, 2007-10-29 at 10:22 +0800, Zhang, Yanmin wrote:
> > On Fri, 2007-10-26 at 13:23 +0200, Ingo Molnar wrote:
> > > * Zhang, Yanmin <[email protected]> wrote:
> > >
> > > > I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors.
> > > >
> > > > Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect
> > > > and found patch
> > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb
> > > > caused the issue.
> > >
> > > weird, that's a commit diff - i.e. it changes no code.
> > I got the tag from #git log. As for above link, I just added prior http address,
> > so readers could check the patch by clicking.
> >
> > >
> > > > kbuild/SPECjbb2000/SPECjbb2005 also has big regressions. On my another
> > > > tigerton machine (4 quad-core processors), SPECjbb2005 has more than
> > > > -40% regression. I didn't do a bisect on such benchmark testing, but I
> > > > suspect the root cause is like aim7's.
> > >
> > > these two commits might be relevant:
> > >
> > > 7a6c6bcee029a978f866511d6e41dbc7301fde4c
> > I did a quick testing. This patch has no impact.
> >
> > > 95dbb421d12fdd9796ed153853daf3679809274f
> > Above big patch doesn't include this one, which means if I do
> > 'git checkout b5869ce7f68b233ceb81465a7644be0d9a5f3dbb', the kernel doesn't include
> > 95dbb421d12fdd9796ed153853daf3679809274f.
> >
> > >
> > > but a bisection result would be the best info.
> > I will do a bisect between 2.6.23 and tag 9c63d9c021f375a2708ad79043d6f4dd1291a085.
> I ran git bisect with kernel version as the tag. It looks like git will
> be crazy sometimes. So I checked ChangeLog and used the number tag to replace
> the kernel version and retested it.
>
> It looks like at least 2 patches were responsible for the regression. I'm
> doing sub-bisect now.
>
> I could find aim7 regression on all my testing machines although the regression
> percentage is different.
>
> Machine regression
> 8-core stoakley 30%
> 16-core tigerton 6%
> tulsa(dual-core+HT, 16 logical cpu) 20%
sub-bisecting captured patch 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
caused 20% regression of aim7.
The last 10% should be also related to sched parameters, such like
sysctl_sched_min_granularity.
-yanmin
* Zhang, Yanmin <[email protected]> wrote:
> sub-bisecting captured patch
> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> caused 20% regression of aim7.
>
> The last 10% should be also related to sched parameters, such like
> sysctl_sched_min_granularity.
ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
please try to figure out what the best value for
/proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
/proc/sys/kernel_sched_min_granularity is?
there's a tuning constraint for kernel_sched_nr_latency:
- kernel_sched_nr_latency should always be set to
kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
tunable)
i suspect a good approach would be to double the value of
kernel_sched_latency and kernel_sched_nr_latency in each tuning
iteration, while keeping kernel_sched_min_granularity unchanged. That
will excercise the tuning values of the 2.6.23 kernel as well.
Ingo
On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
> * Zhang, Yanmin <[email protected]> wrote:
>
> > sub-bisecting captured patch
> > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> > caused 20% regression of aim7.
> >
> > The last 10% should be also related to sched parameters, such like
> > sysctl_sched_min_granularity.
>
> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
> please try to figure out what the best value for
> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
> /proc/sys/kernel_sched_min_granularity is?
>
> there's a tuning constraint for kernel_sched_nr_latency:
>
> - kernel_sched_nr_latency should always be set to
> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
> tunable)
>
> i suspect a good approach would be to double the value of
> kernel_sched_latency and kernel_sched_nr_latency in each tuning
> iteration, while keeping kernel_sched_min_granularity unchanged. That
> will excercise the tuning values of the 2.6.23 kernel as well.
I followed your idea to test 2.6.24-rc1. The improvement is slow.
When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
is still about 15% less than 2.6.23.
-yanmin
On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
> > * Zhang, Yanmin <[email protected]> wrote:
> >
> > > sub-bisecting captured patch
> > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> > > caused 20% regression of aim7.
> > >
> > > The last 10% should be also related to sched parameters, such like
> > > sysctl_sched_min_granularity.
> >
> > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
> > please try to figure out what the best value for
> > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
> > /proc/sys/kernel_sched_min_granularity is?
> >
> > there's a tuning constraint for kernel_sched_nr_latency:
> >
> > - kernel_sched_nr_latency should always be set to
> > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
> > tunable)
> >
> > i suspect a good approach would be to double the value of
> > kernel_sched_latency and kernel_sched_nr_latency in each tuning
> > iteration, while keeping kernel_sched_min_granularity unchanged. That
> > will excercise the tuning values of the 2.6.23 kernel as well.
> I followed your idea to test 2.6.24-rc1. The improvement is slow.
> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
> is still about 15% less than 2.6.23.
I got the aim7 30% regression on my new upgraded stoakley machine. I found
this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
be dual-channel?) is slow. So I retested it on the old machine and found on the old
stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
machine.
By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
the regression becomes about 2%. Other latency has more regression.
On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
the regression becomes less than 1% (The original regression is about 20%).
When I ran a bad script to change the values of sched_nr_latency and sched_latency_ns,
I hit OOPS on my tulsa machine. Below is the log. It looks like sched_nr_latency becomes
0.
*******************Log************************************
divide error: 0000 [1] SMP
CPU 1
Modules linked in: megaraid_mbox megaraid_mm
Pid: 7326, comm: sh Not tainted 2.6.24-rc1 #2
RIP: 0010:[<ffffffff8022c2bf>] [<ffffffff8022c2bf>] __sched_period+0x22/0x2e
RSP: 0018:ffff810105909e38 EFLAGS: 00010046
RAX: 000000005a000000 RBX: 0000000000000000 RCX: 000000002d000000
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
RBP: ffff810105909e40 R08: ffff810103bfed50 R09: 00000000ffffffff
R10: 0000000000000038 R11: 0000000000000296 R12: ffff810100d6db40
R13: ffff8101058c4148 R14: 0000000000000001 R15: ffff810104c34088
FS: 00002b851bc59f50(0000) GS:ffff810100cb1b40(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000006c64d8 CR3: 000000010752c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process sh (pid: 7326, threadinfo ffff810105908000, task ffff810104c34040)
Stack: 0000000000000800 ffff810105909e58 ffffffff8022c2db 00000000079d292b
ffff810105909e88 ffffffff8022c36e ffff810100d6db40 ffff8101058c4148
ffff8101058c4100 0000000000000001 ffff810105909ec8 ffffffff80232d0a
Call Trace:
[<ffffffff8022c2db>] __sched_vslice+0x10/0x1d
[<ffffffff8022c36e>] place_entity+0x86/0xc3
[<ffffffff80232d0a>] task_new_fair+0x48/0xa5
[<ffffffff8020b63e>] system_call+0x7e/0x83
[<ffffffff80233325>] wake_up_new_task+0x70/0xa4
[<ffffffff80235612>] do_fork+0x137/0x204
[<ffffffff802818bd>] vfs_write+0x121/0x136
[<ffffffff8023f017>] recalc_sigpending+0xe/0x25
[<ffffffff8023f0ef>] sigprocmask+0x9e/0xc0
[<ffffffff8020b957>] ptregscall_common+0x67/0xb0
Code: 48 f7 f3 48 89 c1 5b c9 48 89 c8 c3 55 48 89 e5 53 48 89 fb
RIP [<ffffffff8022c2bf>] __sched_period+0x22/0x2e
RSP <ffff810105909e38>
divide error: 0000 [2] SMP
CPU 0
Modules linked in: megaraid_mbox megaraid_mm
Pid: 3674, comm: automount Tainted: G D 2.6.24-rc1 #2
RIP: 0010:[<ffffffff8022c2bf>] [<ffffffff8022c2bf>] __sched_period+0x22/0x2e
RSP: 0018:ffff81010690de38 EFLAGS: 00010046
RAX: 000000005a000000 RBX: 0000000000000000 RCX: 000000002d000000
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000002
RBP: ffff81010690de40 R08: ffff81010690c000 R09: 00000000ffffffff
R10: 0000000000000038 R11: ffff810104007040 R12: ffff810001033880
R13: ffff810100f2a828 R14: 0000000000000001 R15: ffff810104007088
FS: 0000000040021950(0063) GS:ffffffff8074e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b6cc4245000 CR3: 0000000105972000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process automount (pid: 3674, threadinfo ffff81010690c000, task ffff810104007040)
Stack: 0000000000000800 ffff81010690de58 ffffffff8022c2db 000000057aef240d
ffff81010690de88 ffffffff8022c36e ffff810001033880 ffff810100f2a828
ffff810100f2a7e0 0000000000000000 ffff81010690dec8 ffffffff80232d0a
Call Trace:
[<ffffffff8022c2db>] __sched_vslice+0x10/0x1d
[<ffffffff8022c36e>] place_entity+0x86/0xc3
[<ffffffff80232d0a>] task_new_fair+0x48/0xa5
[<ffffffff8020b63e>] system_call+0x7e/0x83
[<ffffffff80233325>] wake_up_new_task+0x70/0xa4
[<ffffffff80235612>] do_fork+0x137/0x204
[<ffffffff8020b957>] ptregscall_common+0x67/0xb0
Code: 48 f7 f3 48 89 c1 5b c9 48 89 c8 c3 55 48 89 e5 53 48 89 fb
RIP [<ffffffff8022c2bf>] __sched_period+0x22/0x2e
RSP <ffff81010690de38>
On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote:
> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
> > On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
> > > * Zhang, Yanmin <[email protected]> wrote:
> > >
> > > > sub-bisecting captured patch
> > > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> > > > caused 20% regression of aim7.
> > > >
> > > > The last 10% should be also related to sched parameters, such like
> > > > sysctl_sched_min_granularity.
> > >
> > > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
> > > please try to figure out what the best value for
> > > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
> > > /proc/sys/kernel_sched_min_granularity is?
> > >
> > > there's a tuning constraint for kernel_sched_nr_latency:
> > >
> > > - kernel_sched_nr_latency should always be set to
> > > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
> > > tunable)
> > >
> > > i suspect a good approach would be to double the value of
> > > kernel_sched_latency and kernel_sched_nr_latency in each tuning
> > > iteration, while keeping kernel_sched_min_granularity unchanged. That
> > > will excercise the tuning values of the 2.6.23 kernel as well.
> > I followed your idea to test 2.6.24-rc1. The improvement is slow.
> > When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
> > is still about 15% less than 2.6.23.
>
> I got the aim7 30% regression on my new upgraded stoakley machine. I found
> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
> be dual-channel?) is slow. So I retested it on the old machine and found on the old
> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
> machine.
>
> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
> the regression becomes about 2%. Other latency has more regression.
>
> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
> the regression becomes less than 1% (The original regression is about 20%).
>
> When I ran a bad script to change the values of sched_nr_latency and sched_latency_ns,
> I hit OOPS on my tulsa machine. Below is the log. It looks like sched_nr_latency becomes
> 0.
Oops, yeah I think I overlooked that case :-/
I think limiting the sysctl parameters make most sense, as a 0 value
really doesn't.
Signed-off-by: Peter Zijlstra <[email protected]>
---
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 3b4efbe..0f34c91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -94,6 +94,7 @@ static int two = 2;
static int zero;
static int one_hundred = 100;
+static int int_max = INT_MAX;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
@@ -239,7 +240,10 @@ static struct ctl_table kern_table[] = {
.data = &sysctl_sched_nr_latency,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = &proc_dointvec,
+ .proc_handler = &proc_dointvec_minmax,
+ .strategy = &sysctl_intvec,
+ .extra1 = &one,
+ .extra2 = &int_max,
},
{
.ctl_name = CTL_UNNUMBERED,
* Peter Zijlstra <[email protected]> wrote:
> static int one_hundred = 100;
> +static int int_max = INT_MAX;
>
> /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> static int maxolduid = 65535;
> @@ -239,7 +240,10 @@ static struct ctl_table kern_table[] = {
> .data = &sysctl_sched_nr_latency,
> .maxlen = sizeof(unsigned int),
> .mode = 0644,
> - .proc_handler = &proc_dointvec,
> + .proc_handler = &proc_dointvec_minmax,
> + .strategy = &sysctl_intvec,
> + .extra1 = &one,
> + .extra2 = &int_max,
could we instead justmake sched_nr_latency non-tunable, and recalculate
it from the sysctl handler whenever sched_latency or
sched_min_granularity changes? That would avoid not only the division by
zero bug but also other out-of-spec tunings.
Ingo
On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote:
> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
> > On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
> > > * Zhang, Yanmin <[email protected]> wrote:
> > >
> > > > sub-bisecting captured patch
> > > > 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> > > > caused 20% regression of aim7.
> > > >
> > > > The last 10% should be also related to sched parameters, such like
> > > > sysctl_sched_min_granularity.
> > >
> > > ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
> > > please try to figure out what the best value for
> > > /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
> > > /proc/sys/kernel_sched_min_granularity is?
> > >
> > > there's a tuning constraint for kernel_sched_nr_latency:
> > >
> > > - kernel_sched_nr_latency should always be set to
> > > kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
> > > tunable)
> > >
> > > i suspect a good approach would be to double the value of
> > > kernel_sched_latency and kernel_sched_nr_latency in each tuning
> > > iteration, while keeping kernel_sched_min_granularity unchanged. That
> > > will excercise the tuning values of the 2.6.23 kernel as well.
> > I followed your idea to test 2.6.24-rc1. The improvement is slow.
> > When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
> > is still about 15% less than 2.6.23.
>
> I got the aim7 30% regression on my new upgraded stoakley machine. I found
> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
> be dual-channel?) is slow. So I retested it on the old machine and found on the old
> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
> machine.
>
> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
> the regression becomes about 2%. Other latency has more regression.
>
> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
> the regression becomes less than 1% (The original regression is about 20%).
I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton,
the regression is still more than 40%. On stoakley machine, it becomes worse (26%,
original is 9%). I will do more investigation to make sure SPECjbb regression is
also casued by the bad default values.
We need a smarter method to calculate the best default values for the key tuning
parameters.
One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no
regression). Good job!
-yanmin
Zhang, Yanmin wrote:
> On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote:
>> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
>>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
>>>> * Zhang, Yanmin <[email protected]> wrote:
>>>>
>>>>> sub-bisecting captured patch
>>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
>>>>> caused 20% regression of aim7.
>>>>>
>>>>> The last 10% should be also related to sched parameters, such like
>>>>> sysctl_sched_min_granularity.
>>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
>>>> please try to figure out what the best value for
>>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
>>>> /proc/sys/kernel_sched_min_granularity is?
>>>>
>>>> there's a tuning constraint for kernel_sched_nr_latency:
>>>>
>>>> - kernel_sched_nr_latency should always be set to
>>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
>>>> tunable)
>>>>
>>>> i suspect a good approach would be to double the value of
>>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning
>>>> iteration, while keeping kernel_sched_min_granularity unchanged. That
>>>> will excercise the tuning values of the 2.6.23 kernel as well.
>>> I followed your idea to test 2.6.24-rc1. The improvement is slow.
>>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
>>> is still about 15% less than 2.6.23.
>> I got the aim7 30% regression on my new upgraded stoakley machine. I found
>> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
>> be dual-channel?) is slow. So I retested it on the old machine and found on the old
>> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
>> machine.
>>
>> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
>> the regression becomes about 2%. Other latency has more regression.
>>
>> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
>> the regression becomes less than 1% (The original regression is about 20%).
> I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton,
> the regression is still more than 40%. On stoakley machine, it becomes worse (26%,
> original is 9%). I will do more investigation to make sure SPECjbb regression is
> also casued by the bad default values.
>
> We need a smarter method to calculate the best default values for the key tuning
> parameters.
>
> One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no
> regression). Good job!
Do you mean you couldn't reproduce the regression which was reported
with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1? It
would be nice if you could provide some numbers for 2.6.22, 2.6.23 and
2.6.24-rc1.
> -yanmin
greetings
Cyrus
(restoring CCs which I inadvertly dropped)
On Thu, 2007-11-01 at 16:00 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > > could we instead justmake sched_nr_latency non-tunable, and
> > > recalculate it from the sysctl handler whenever sched_latency or
> > > sched_min_granularity changes? That would avoid not only the
> > > division by zero bug but also other out-of-spec tunings.
> >
> > We don't have min_granularity anymore.
>
> i think we should reintroduce it in the SCHED_DEBUG case and make it the
> main tunable item - sched_nr is a nice performance optimization but
> quite unintuitive as a tuning knob.
ok, I don't particularly care either way, could be because I wrote the
stuff :-)
Signed-off-by: Peter Zijlstra <[email protected]>
---
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -1466,12 +1466,16 @@ extern void sched_idle_next(void);
#ifdef CONFIG_SCHED_DEBUG
extern unsigned int sysctl_sched_latency;
-extern unsigned int sysctl_sched_nr_latency;
+extern unsigned int sysctl_sched_min_granularity;
extern unsigned int sysctl_sched_wakeup_granularity;
extern unsigned int sysctl_sched_batch_wakeup_granularity;
extern unsigned int sysctl_sched_child_runs_first;
extern unsigned int sysctl_sched_features;
extern unsigned int sysctl_sched_migration_cost;
+
+int sched_nr_latency_handler(struct ctl_table *table, int write,
+ struct file *file, void __user *buffer, size_t *length,
+ loff_t *ppos);
#endif
extern unsigned int sysctl_sched_compat_yield;
Index: linux-2.6/kernel/sched_debug.c
===================================================================
--- linux-2.6.orig/kernel/sched_debug.c
+++ linux-2.6/kernel/sched_debug.c
@@ -210,7 +210,7 @@ static int sched_debug_show(struct seq_f
#define PN(x) \
SEQ_printf(m, " .%-40s: %Ld.%06ld\n", #x, SPLIT_NS(x))
PN(sysctl_sched_latency);
- PN(sysctl_sched_nr_latency);
+ PN(sysctl_sched_min_granularity);
PN(sysctl_sched_wakeup_granularity);
PN(sysctl_sched_batch_wakeup_granularity);
PN(sysctl_sched_child_runs_first);
Index: linux-2.6/kernel/sched_fair.c
===================================================================
--- linux-2.6.orig/kernel/sched_fair.c
+++ linux-2.6/kernel/sched_fair.c
@@ -35,16 +35,21 @@
const_debug unsigned int sysctl_sched_latency = 20000000ULL;
/*
- * After fork, child runs first. (default) If set to 0 then
- * parent will (try to) run first.
+ * Minimal preemption granularity for CPU-bound tasks:
+ * (default: 1 msec, units: nanoseconds)
*/
-const_debug unsigned int sysctl_sched_child_runs_first = 1;
+const_debug unsigned int sysctl_sched_min_granularity = 1000000ULL;
/*
- * Minimal preemption granularity for CPU-bound tasks:
- * (default: 2 msec, units: nanoseconds)
+ * is kept at sysctl_sched_latency / sysctl_sched_min_granularity
+ */
+const_debug unsigned int sched_nr_latency = 20;
+
+/*
+ * After fork, child runs first. (default) If set to 0 then
+ * parent will (try to) run first.
*/
-const_debug unsigned int sysctl_sched_nr_latency = 20;
+const_debug unsigned int sysctl_sched_child_runs_first = 1;
/*
* sys_sched_yield() compat mode
@@ -301,6 +306,21 @@ static inline struct sched_entity *__pic
* Scheduling class statistics methods:
*/
+#ifdef CONFIG_SCHED_DEBUG
+int sched_nr_latency_handler(struct ctl_table *table, int write,
+ struct file *filp, void __user *buffer, size_t *lenp,
+ loff_t *ppos)
+{
+ int ret = proc_dointvec_minmax(table, write, filp, buffer, lenp, ppos);
+
+ if (!ret && write) {
+ sched_nr_latency =
+ sysctl_sched_latency / sysctl_sched_min_granularity;
+ }
+
+ return ret;
+}
+#endif
/*
* The idea is to set a period in which each task runs once.
@@ -313,7 +333,7 @@ static inline struct sched_entity *__pic
static u64 __sched_period(unsigned long nr_running)
{
u64 period = sysctl_sched_latency;
- unsigned long nr_latency = sysctl_sched_nr_latency;
+ unsigned long nr_latency = sched_nr_latency;
if (unlikely(nr_running > nr_latency)) {
period *= nr_running;
Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -235,11 +235,14 @@ static struct ctl_table kern_table[] = {
#ifdef CONFIG_SCHED_DEBUG
{
.ctl_name = CTL_UNNUMBERED,
- .procname = "sched_nr_latency",
- .data = &sysctl_sched_nr_latency,
+ .procname = "sched_min_granularity_ns",
+ .data = &sysctl_sched_min_granularity,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = &proc_dointvec,
+ .proc_handler = &sched_nr_latency_handler,
+ .strategy = &sysctl_intvec,
+ .extra1 = &min_sched_granularity_ns,
+ .extra2 = &max_sched_granularity_ns,
},
{
.ctl_name = CTL_UNNUMBERED,
@@ -247,7 +250,7 @@ static struct ctl_table kern_table[] = {
.data = &sysctl_sched_latency,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = &proc_dointvec_minmax,
+ .proc_handler = &sched_nr_latency_handler,
.strategy = &sysctl_intvec,
.extra1 = &min_sched_granularity_ns,
.extra2 = &max_sched_granularity_ns,
* Peter Zijlstra <[email protected]> wrote:
> > > We don't have min_granularity anymore.
> >
> > i think we should reintroduce it in the SCHED_DEBUG case and make it
> > the main tunable item - sched_nr is a nice performance optimization
> > but quite unintuitive as a tuning knob.
>
> ok, I don't particularly care either way, could be because I wrote the
> stuff :-)
heh :-) I've applied your patch, it looks good to me.
Ingo
On Thu, 2007-11-01 at 11:02 +0100, Cyrus Massoumi wrote:
> Zhang, Yanmin wrote:
> > On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote:
> >> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
> >>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
> >>>> * Zhang, Yanmin <[email protected]> wrote:
> >>>>
> >>>>> sub-bisecting captured patch
> >>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> >>>>> caused 20% regression of aim7.
> >>>>>
> >>>>> The last 10% should be also related to sched parameters, such like
> >>>>> sysctl_sched_min_granularity.
> >>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
> >>>> please try to figure out what the best value for
> >>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
> >>>> /proc/sys/kernel_sched_min_granularity is?
> >>>>
> >>>> there's a tuning constraint for kernel_sched_nr_latency:
> >>>>
> >>>> - kernel_sched_nr_latency should always be set to
> >>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
> >>>> tunable)
> >>>>
> >>>> i suspect a good approach would be to double the value of
> >>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning
> >>>> iteration, while keeping kernel_sched_min_granularity unchanged. That
> >>>> will excercise the tuning values of the 2.6.23 kernel as well.
> >>> I followed your idea to test 2.6.24-rc1. The improvement is slow.
> >>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
> >>> is still about 15% less than 2.6.23.
> >> I got the aim7 30% regression on my new upgraded stoakley machine. I found
> >> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
> >> be dual-channel?) is slow. So I retested it on the old machine and found on the old
> >> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
> >> machine.
> >>
> >> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
> >> the regression becomes about 2%. Other latency has more regression.
> >>
> >> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
> >> the regression becomes less than 1% (The original regression is about 20%).
> > I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton,
> > the regression is still more than 40%. On stoakley machine, it becomes worse (26%,
> > original is 9%). I will do more investigation to make sure SPECjbb regression is
> > also casued by the bad default values.
> >
> > We need a smarter method to calculate the best default values for the key tuning
> > parameters.
> >
> > One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no
> > regression). Good job!
>
> Do you mean you couldn't reproduce the regression which was reported
> with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1?
It looks like you missed my emails.
Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22,
2.6.23-rc and 2.6.23.
Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now.
Then I retested it with 2.6.24-rc1 on a couple of x86_64 machines. The issue
disappeared. You could test it with 2.6.24-rc1.
> It
> would be nice if you could provide some numbers for 2.6.22, 2.6.23 and
> 2.6.24-rc1.
Sorry. Intel policy doesn't allow me to publish the numbers because only
specific departments in Intel could do that. But I could talk the regression
percentage.
-yanmin
Zhang, Yanmin wrote:
> On Thu, 2007-11-01 at 11:02 +0100, Cyrus Massoumi wrote:
>> Zhang, Yanmin wrote:
>>> On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote:
>>>> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
>>>>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
>>>>>> * Zhang, Yanmin <[email protected]> wrote:
>>>>>>
>>>>>>> sub-bisecting captured patch
>>>>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
>>>>>>> caused 20% regression of aim7.
>>>>>>>
>>>>>>> The last 10% should be also related to sched parameters, such like
>>>>>>> sysctl_sched_min_granularity.
>>>>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
>>>>>> please try to figure out what the best value for
>>>>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
>>>>>> /proc/sys/kernel_sched_min_granularity is?
>>>>>>
>>>>>> there's a tuning constraint for kernel_sched_nr_latency:
>>>>>>
>>>>>> - kernel_sched_nr_latency should always be set to
>>>>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
>>>>>> tunable)
>>>>>>
>>>>>> i suspect a good approach would be to double the value of
>>>>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning
>>>>>> iteration, while keeping kernel_sched_min_granularity unchanged. That
>>>>>> will excercise the tuning values of the 2.6.23 kernel as well.
>>>>> I followed your idea to test 2.6.24-rc1. The improvement is slow.
>>>>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
>>>>> is still about 15% less than 2.6.23.
>>>> I got the aim7 30% regression on my new upgraded stoakley machine. I found
>>>> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
>>>> be dual-channel?) is slow. So I retested it on the old machine and found on the old
>>>> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
>>>> machine.
>>>>
>>>> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
>>>> the regression becomes about 2%. Other latency has more regression.
>>>>
>>>> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
>>>> the regression becomes less than 1% (The original regression is about 20%).
>>> I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton,
>>> the regression is still more than 40%. On stoakley machine, it becomes worse (26%,
>>> original is 9%). I will do more investigation to make sure SPECjbb regression is
>>> also casued by the bad default values.
>>>
>>> We need a smarter method to calculate the best default values for the key tuning
>>> parameters.
>>>
>>> One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no
>>> regression). Good job!
>> Do you mean you couldn't reproduce the regression which was reported
>> with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1?
> It looks like you missed my emails.
Yeah :(
> Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22,
> 2.6.23-rc and 2.6.23.
>
> Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now.
That's nice, could you please point me to the commit?
> Then I retested it with 2.6.24-rc1 on a couple of x86_64 machines. The issue
> disappeared. You could test it with 2.6.24-rc1.
Will do!
>> It
>> would be nice if you could provide some numbers for 2.6.22, 2.6.23 and
>> 2.6.24-rc1.
> Sorry. Intel policy doesn't allow me to publish the numbers because only
> specific departments in Intel could do that. But I could talk the regression
> percentage.
Fair enough :)
> -yanmin
greetings
Cyrus
On Mon, 2007-11-05 at 10:37 +0100, Cyrus Massoumi wrote:
> Zhang, Yanmin wrote:
> > On Thu, 2007-11-01 at 11:02 +0100, Cyrus Massoumi wrote:
> >> Zhang, Yanmin wrote:
> >>> On Wed, 2007-10-31 at 17:57 +0800, Zhang, Yanmin wrote:
> >>>> On Tue, 2007-10-30 at 16:36 +0800, Zhang, Yanmin wrote:
> >>>>> On Tue, 2007-10-30 at 08:26 +0100, Ingo Molnar wrote:
> >>>>>> * Zhang, Yanmin <[email protected]> wrote:
> >>>>>>
> >>>>>>> sub-bisecting captured patch
> >>>>>>> 38ad464d410dadceda1563f36bdb0be7fe4c8938(sched: uniform tunings)
> >>>>>>> caused 20% regression of aim7.
> >>>>>>>
> >>>>>>> The last 10% should be also related to sched parameters, such like
> >>>>>>> sysctl_sched_min_granularity.
> >>>>>> ah, interesting. Since you have CONFIG_SCHED_DEBUG enabled, could you
> >>>>>> please try to figure out what the best value for
> >>>>>> /proc/sys/kernel_sched_latency, /proc/sys/kernel_sched_nr_latency and
> >>>>>> /proc/sys/kernel_sched_min_granularity is?
> >>>>>>
> >>>>>> there's a tuning constraint for kernel_sched_nr_latency:
> >>>>>>
> >>>>>> - kernel_sched_nr_latency should always be set to
> >>>>>> kernel_sched_latency/kernel_sched_min_granularity. (it's not a free
> >>>>>> tunable)
> >>>>>>
> >>>>>> i suspect a good approach would be to double the value of
> >>>>>> kernel_sched_latency and kernel_sched_nr_latency in each tuning
> >>>>>> iteration, while keeping kernel_sched_min_granularity unchanged. That
> >>>>>> will excercise the tuning values of the 2.6.23 kernel as well.
> >>>>> I followed your idea to test 2.6.24-rc1. The improvement is slow.
> >>>>> When sched_nr_latency=2560 and sched_latency_ns=640000000, the performance
> >>>>> is still about 15% less than 2.6.23.
> >>>> I got the aim7 30% regression on my new upgraded stoakley machine. I found
> >>>> this mahcine is slower than the old one. Maybe BIOS has issues, or memeory(Might not
> >>>> be dual-channel?) is slow. So I retested it on the old machine and found on the old
> >>>> stoakley machine, the regression is about 6%, quite similiar to the regression on tigerton
> >>>> machine.
> >>>>
> >>>> By sched_nr_latency=640 and sched_latency_ns=640000000 on the old stoakley machine,
> >>>> the regression becomes about 2%. Other latency has more regression.
> >>>>
> >>>> On my tulsa machine, by sched_nr_latency=640 and sched_latency_ns=640000000,
> >>>> the regression becomes less than 1% (The original regression is about 20%).
> >>> I rerun SPECjbb by ched_nr_latency=640 and sched_latency_ns=640000000. On tigerton,
> >>> the regression is still more than 40%. On stoakley machine, it becomes worse (26%,
> >>> original is 9%). I will do more investigation to make sure SPECjbb regression is
> >>> also casued by the bad default values.
> >>>
> >>> We need a smarter method to calculate the best default values for the key tuning
> >>> parameters.
> >>>
> >>> One interesting is sysbench+mysql(readonly) got the same result like 2.6.22 (no
> >>> regression). Good job!
> >> Do you mean you couldn't reproduce the regression which was reported
> >> with 2.6.23 (http://lkml.org/lkml/2007/10/30/53) with 2.6.24-rc1?
> > It looks like you missed my emails.
>
> Yeah :(
>
> > Firstly, I reproduced (or just find the same myself :) ) the issue with kernel 2.6.22,
> > 2.6.23-rc and 2.6.23.
> >
> > Ingo wrote a big patch to fix it and the new patch is in 2.6.24-rc1 now.
>
> That's nice, could you please point me to the commit?
The patch is very big.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b5869ce7f68b233ceb81465a7644be0d9a5f3dbb