2014-01-05 09:05:08

by Fengguang Wu

[permalink] [raw]
Subject: [sched] 23f0d2093c: -12.6% regression on sparse file copy

Hi Joonsoo,

We noticed the below changes for commit 23f0d2093c ("sched: Factor out
code to should_we_balance()") in test vm-scalability/300s-lru-file-readtwice

95a79b805b935f4 23f0d2093c789e612185180c4
--------------- -------------------------
==> 4.45 ~ 5% +1777.7% 83.60 ~ 5% vm-scalability.stddev
==> 14966511 ~ 0% -12.6% 13084545 ~ 2% vm-scalability.throughput
38 ~ 9% +406.3% 193 ~ 7% proc-vmstat.kswapd_low_wmark_hit_quickly
610823 ~ 0% -41.4% 357990 ~ 0% softirqs.SCHED
5.424e+08 ~ 0% -38.5% 3.338e+08 ~ 6% proc-vmstat.pgdeactivate
4.68e+08 ~ 0% -37.5% 2.924e+08 ~ 6% proc-vmstat.pgrefill_normal
5.549e+08 ~ 0% -37.1% 3.491e+08 ~ 6% proc-vmstat.pgactivate
14938509 ~ 1% +27.0% 18974176 ~ 1% vmstat.memory.free
978771 ~ 1% +23.9% 1212704 ~ 3% numa-vmstat.node2.nr_free_pages
3747434 ~ 0% +21.7% 4560196 ~ 2% proc-vmstat.nr_free_pages
==> 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_foreign
1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_miss
1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_other
3936842 ~ 1% +22.2% 4812045 ~ 4% numa-meminfo.node2.MemFree
21803812 ~ 0% +17.7% 25661536 ~ 4% numa-vmstat.node3.numa_foreign
73701524 ~ 0% +15.0% 84769542 ~ 0% proc-vmstat.pgscan_direct_dma32
73700683 ~ 0% +15.0% 84768687 ~ 0% proc-vmstat.pgsteal_direct_dma32
3.101e+08 ~ 0% +11.2% 3.448e+08 ~ 0% proc-vmstat.pgsteal_direct_normal
3.103e+08 ~ 0% +11.2% 3.449e+08 ~ 0% proc-vmstat.pgscan_direct_normal
45613907 ~ 0% +12.6% 51342974 ~ 3% numa-vmstat.node0.numa_other
795639 ~ 0% -48.6% 409113 ~13% time.voluntary_context_switches
375 ~ 0% +6.1% 398 ~ 0% time.elapsed_time
9427 ~ 0% -5.8% 8880 ~ 0% time.percent_of_cpu_this_job_got

The test case basically does

for i in `seq 1 $nr_cpu`
do
create_sparse_file huge-$i
dd if=huge-$i of=/dev/null &
dd if=huge-$i of=/dev/null &
done

where nr_cpu=120 (test box is a 4-socket ivybridge system).

The change looks stable, each point below is a sample run:

vm-scalability.stddev

120 ++-------------------------------------------------------------------+
| |
100 ++ * * |
| *.*** : ** : * * * * * |
** * *.** * : * :*.* :: .* : : * :* * : .* : .* * .**|
80 ++ * * *. : * *: ** : :: : * :.* * * * ** : :* *
| * * : *** * * * :** |
60 ++ * * |
| |
40 ++ |
| |
| |
20 ++ |
| O OO OO OOO O OO O |
0 OO--O--O------OO----OO-----------------------------------------------+


2014-01-06 00:30:45

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [sched] 23f0d2093c: -12.6% regression on sparse file copy

On Sun, Jan 05, 2014 at 05:04:56PM +0800, [email protected] wrote:
> Hi Joonsoo,
>
> We noticed the below changes for commit 23f0d2093c ("sched: Factor out
> code to should_we_balance()") in test vm-scalability/300s-lru-file-readtwice

Hello, Fengguang.

There was a mistake in this patch and there was a fix and it was already merged
into mainline.

Could you test again with the commit (b0cff9d sched: Fix load balancing
performance regression in should_we_balance())?

Thanks.

>
> 95a79b805b935f4 23f0d2093c789e612185180c4
> --------------- -------------------------
> ==> 4.45 ~ 5% +1777.7% 83.60 ~ 5% vm-scalability.stddev
> ==> 14966511 ~ 0% -12.6% 13084545 ~ 2% vm-scalability.throughput
> 38 ~ 9% +406.3% 193 ~ 7% proc-vmstat.kswapd_low_wmark_hit_quickly
> 610823 ~ 0% -41.4% 357990 ~ 0% softirqs.SCHED
> 5.424e+08 ~ 0% -38.5% 3.338e+08 ~ 6% proc-vmstat.pgdeactivate
> 4.68e+08 ~ 0% -37.5% 2.924e+08 ~ 6% proc-vmstat.pgrefill_normal
> 5.549e+08 ~ 0% -37.1% 3.491e+08 ~ 6% proc-vmstat.pgactivate
> 14938509 ~ 1% +27.0% 18974176 ~ 1% vmstat.memory.free
> 978771 ~ 1% +23.9% 1212704 ~ 3% numa-vmstat.node2.nr_free_pages
> 3747434 ~ 0% +21.7% 4560196 ~ 2% proc-vmstat.nr_free_pages
> ==> 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_foreign
> 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_miss
> 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_other
> 3936842 ~ 1% +22.2% 4812045 ~ 4% numa-meminfo.node2.MemFree
> 21803812 ~ 0% +17.7% 25661536 ~ 4% numa-vmstat.node3.numa_foreign
> 73701524 ~ 0% +15.0% 84769542 ~ 0% proc-vmstat.pgscan_direct_dma32
> 73700683 ~ 0% +15.0% 84768687 ~ 0% proc-vmstat.pgsteal_direct_dma32
> 3.101e+08 ~ 0% +11.2% 3.448e+08 ~ 0% proc-vmstat.pgsteal_direct_normal
> 3.103e+08 ~ 0% +11.2% 3.449e+08 ~ 0% proc-vmstat.pgscan_direct_normal
> 45613907 ~ 0% +12.6% 51342974 ~ 3% numa-vmstat.node0.numa_other
> 795639 ~ 0% -48.6% 409113 ~13% time.voluntary_context_switches
> 375 ~ 0% +6.1% 398 ~ 0% time.elapsed_time
> 9427 ~ 0% -5.8% 8880 ~ 0% time.percent_of_cpu_this_job_got
>
> The test case basically does
>
> for i in `seq 1 $nr_cpu`
> do
> create_sparse_file huge-$i
> dd if=huge-$i of=/dev/null &
> dd if=huge-$i of=/dev/null &
> done
>
> where nr_cpu=120 (test box is a 4-socket ivybridge system).
>
> The change looks stable, each point below is a sample run:
>
> vm-scalability.stddev
>
> 120 ++-------------------------------------------------------------------+
> | |
> 100 ++ * * |
> | *.*** : ** : * * * * * |
> ** * *.** * : * :*.* :: .* : : * :* * : .* : .* * .**|
> 80 ++ * * *. : * *: ** : :: : * :.* * * * ** : :* *
> | * * : *** * * * :** |
> 60 ++ * * |
> | |
> 40 ++ |
> | |
> | |
> 20 ++ |
> | O OO OO OOO O OO O |
> 0 OO--O--O------OO----OO-----------------------------------------------+
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-01-06 07:10:13

by Fengguang Wu

[permalink] [raw]
Subject: Re: [sched] 23f0d2093c: -12.6% regression on sparse file copy

Hi Joonsoo,

On Mon, Jan 06, 2014 at 09:30:52AM +0900, Joonsoo Kim wrote:
> On Sun, Jan 05, 2014 at 05:04:56PM +0800, [email protected] wrote:
> > Hi Joonsoo,
> >
> > We noticed the below changes for commit 23f0d2093c ("sched: Factor out
> > code to should_we_balance()") in test vm-scalability/300s-lru-file-readtwice
>
> Hello, Fengguang.
>
> There was a mistake in this patch and there was a fix and it was already merged
> into mainline.
>
> Could you test again with the commit (b0cff9d sched: Fix load balancing
> performance regression in should_we_balance())?

Yes, b0cff9d completely restores the performance. Sorry for the noise!

Thanks,
Fengguang

> >
> > 95a79b805b935f4 23f0d2093c789e612185180c4
> > --------------- -------------------------
> > ==> 4.45 ~ 5% +1777.7% 83.60 ~ 5% vm-scalability.stddev
> > ==> 14966511 ~ 0% -12.6% 13084545 ~ 2% vm-scalability.throughput
> > 38 ~ 9% +406.3% 193 ~ 7% proc-vmstat.kswapd_low_wmark_hit_quickly
> > 610823 ~ 0% -41.4% 357990 ~ 0% softirqs.SCHED
> > 5.424e+08 ~ 0% -38.5% 3.338e+08 ~ 6% proc-vmstat.pgdeactivate
> > 4.68e+08 ~ 0% -37.5% 2.924e+08 ~ 6% proc-vmstat.pgrefill_normal
> > 5.549e+08 ~ 0% -37.1% 3.491e+08 ~ 6% proc-vmstat.pgactivate
> > 14938509 ~ 1% +27.0% 18974176 ~ 1% vmstat.memory.free
> > 978771 ~ 1% +23.9% 1212704 ~ 3% numa-vmstat.node2.nr_free_pages
> > 3747434 ~ 0% +21.7% 4560196 ~ 2% proc-vmstat.nr_free_pages
> > ==> 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_foreign
> > 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_miss
> > 1.353e+08 ~ 0% +18.8% 1.607e+08 ~ 0% proc-vmstat.numa_other
> > 3936842 ~ 1% +22.2% 4812045 ~ 4% numa-meminfo.node2.MemFree
> > 21803812 ~ 0% +17.7% 25661536 ~ 4% numa-vmstat.node3.numa_foreign
> > 73701524 ~ 0% +15.0% 84769542 ~ 0% proc-vmstat.pgscan_direct_dma32
> > 73700683 ~ 0% +15.0% 84768687 ~ 0% proc-vmstat.pgsteal_direct_dma32
> > 3.101e+08 ~ 0% +11.2% 3.448e+08 ~ 0% proc-vmstat.pgsteal_direct_normal
> > 3.103e+08 ~ 0% +11.2% 3.449e+08 ~ 0% proc-vmstat.pgscan_direct_normal
> > 45613907 ~ 0% +12.6% 51342974 ~ 3% numa-vmstat.node0.numa_other
> > 795639 ~ 0% -48.6% 409113 ~13% time.voluntary_context_switches
> > 375 ~ 0% +6.1% 398 ~ 0% time.elapsed_time
> > 9427 ~ 0% -5.8% 8880 ~ 0% time.percent_of_cpu_this_job_got
> >
> > The test case basically does
> >
> > for i in `seq 1 $nr_cpu`
> > do
> > create_sparse_file huge-$i
> > dd if=huge-$i of=/dev/null &
> > dd if=huge-$i of=/dev/null &
> > done
> >
> > where nr_cpu=120 (test box is a 4-socket ivybridge system).
> >
> > The change looks stable, each point below is a sample run:
> >
> > vm-scalability.stddev
> >
> > 120 ++-------------------------------------------------------------------+
> > | |
> > 100 ++ * * |
> > | *.*** : ** : * * * * * |
> > ** * *.** * : * :*.* :: .* : : * :* * : .* : .* * .**|
> > 80 ++ * * *. : * *: ** : :: : * :.* * * * ** : :* *
> > | * * : *** * * * :** |
> > 60 ++ * * |
> > | |
> > 40 ++ |
> > | |
> > | |
> > 20 ++ |
> > | O OO OO OOO O OO O |
> > 0 OO--O--O------OO----OO-----------------------------------------------+
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

2014-01-06 07:49:20

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [sched] 23f0d2093c: -12.6% regression on sparse file copy

On Mon, Jan 06, 2014 at 03:10:07PM +0800, Fengguang Wu wrote:
> Hi Joonsoo,
>
> On Mon, Jan 06, 2014 at 09:30:52AM +0900, Joonsoo Kim wrote:
> > On Sun, Jan 05, 2014 at 05:04:56PM +0800, [email protected] wrote:
> > > Hi Joonsoo,
> > >
> > > We noticed the below changes for commit 23f0d2093c ("sched: Factor out
> > > code to should_we_balance()") in test vm-scalability/300s-lru-file-readtwice
> >
> > Hello, Fengguang.
> >
> > There was a mistake in this patch and there was a fix and it was already merged
> > into mainline.
> >
> > Could you test again with the commit (b0cff9d sched: Fix load balancing
> > performance regression in should_we_balance())?
>
> Yes, b0cff9d completely restores the performance. Sorry for the noise!

Thanks for quick response. :)