2015-04-23 06:56:05

by Huang, Ying

[permalink] [raw]
Subject: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

_______________________________________________
LKP mailing list
[email protected]


Attachments:
job.yaml (3.22 kB)
reproduce (680.00 B)
(No filename) (86.00 B)
Download all attachments

2015-04-24 02:16:11

by NeilBrown

[permalink] [raw]
Subject: Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <[email protected]> wrote:

> FYI, we noticed the below changes on
>
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")

Hi,
is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.

Which numbers are "good", which are "bad"? Which is "worst".
What do the graphs really show? and what would we like to see in them?

I think it is really great that you are doing this testing and reporting the
results. It's just so sad that I completely fail to understand them.

Thanks,
NeilBrown

>
>
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
>
> a87d7f782b47e030 878ee6792799e2f88bdcac3298
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 59035 ± 0% +18.4% 69913 ± 1% softirqs.SCHED
> 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.num_objs
> 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.active_objs
> 305908 ± 0% -1.8% 300427 ± 0% vmstat.io.bo
> 1 ± 0% +100.0% 2 ± 0% vmstat.procs.r
> 8266 ± 1% -15.7% 6968 ± 0% vmstat.system.cs
> 14819 ± 0% -2.1% 14503 ± 0% vmstat.system.in
> 18.20 ± 6% +10.2% 20.05 ± 4% perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> 1.94 ± 9% +90.6% 3.70 ± 9% perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 0.00 ± 0% +Inf% 25.18 ± 3% perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> 0.00 ± 0% +Inf% 14.14 ± 4% perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 1.79 ± 7% +102.9% 3.64 ± 9% perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> 3.09 ± 4% -10.8% 2.76 ± 4% perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> 0.80 ± 14% +28.1% 1.02 ± 10% perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 14.78 ± 6% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 25.68 ± 4% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> 1.23 ± 5% +140.0% 2.96 ± 7% perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> 2.62 ± 6% -95.6% 0.12 ± 33% perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> 0.96 ± 9% +17.5% 1.12 ± 2% perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 1.461e+10 ± 0% -5.3% 1.384e+10 ± 1% perf-stat.L1-dcache-load-misses
> 3.688e+11 ± 0% -2.7% 3.59e+11 ± 0% perf-stat.L1-dcache-loads
> 1.124e+09 ± 0% -27.7% 8.125e+08 ± 0% perf-stat.L1-dcache-prefetches
> 2.767e+10 ± 0% -1.8% 2.717e+10 ± 0% perf-stat.L1-dcache-store-misses
> 2.352e+11 ± 0% -2.8% 2.287e+11 ± 0% perf-stat.L1-dcache-stores
> 6.774e+09 ± 0% -2.3% 6.62e+09 ± 0% perf-stat.L1-icache-load-misses
> 5.571e+08 ± 0% +40.5% 7.826e+08 ± 1% perf-stat.LLC-load-misses
> 6.263e+09 ± 0% -13.7% 5.407e+09 ± 1% perf-stat.LLC-loads
> 1.914e+11 ± 0% -4.2% 1.833e+11 ± 0% perf-stat.branch-instructions
> 1.145e+09 ± 2% -5.6% 1.081e+09 ± 0% perf-stat.branch-load-misses
> 1.911e+11 ± 0% -4.3% 1.829e+11 ± 0% perf-stat.branch-loads
> 1.142e+09 ± 2% -5.1% 1.083e+09 ± 0% perf-stat.branch-misses
> 1.218e+09 ± 0% +19.8% 1.46e+09 ± 0% perf-stat.cache-misses
> 2.118e+10 ± 0% -5.2% 2.007e+10 ± 0% perf-stat.cache-references
> 2510308 ± 1% -15.7% 2115410 ± 0% perf-stat.context-switches
> 39623 ± 0% +22.1% 48370 ± 1% perf-stat.cpu-migrations
> 4.179e+08 ± 40% +165.7% 1.111e+09 ± 35% perf-stat.dTLB-load-misses
> 3.684e+11 ± 0% -2.5% 3.592e+11 ± 0% perf-stat.dTLB-loads
> 1.232e+08 ± 15% +62.5% 2.002e+08 ± 27% perf-stat.dTLB-store-misses
> 2.348e+11 ± 0% -2.5% 2.288e+11 ± 0% perf-stat.dTLB-stores
> 3577297 ± 2% +8.7% 3888986 ± 1% perf-stat.iTLB-load-misses
> 1.035e+12 ± 0% -3.5% 9.988e+11 ± 0% perf-stat.iTLB-loads
> 1.036e+12 ± 0% -3.7% 9.978e+11 ± 0% perf-stat.instructions
> 594 ± 30% +130.3% 1369 ± 13% sched_debug.cfs_rq[0]:/.blocked_load_avg
> 17 ± 10% -28.2% 12 ± 23% sched_debug.cfs_rq[0]:/.nr_spread_over
> 210 ± 21% +42.1% 298 ± 28% sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> 9676 ± 21% +42.1% 13754 ± 28% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> 772 ± 25% +116.5% 1672 ± 9% sched_debug.cfs_rq[0]:/.tg_load_contrib
> 8402 ± 9% +83.3% 15405 ± 11% sched_debug.cfs_rq[0]:/.tg_load_avg
> 8356 ± 9% +82.8% 15272 ± 11% sched_debug.cfs_rq[1]:/.tg_load_avg
> 968 ± 25% +100.8% 1943 ± 14% sched_debug.cfs_rq[1]:/.blocked_load_avg
> 16242 ± 9% -22.2% 12643 ± 14% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> 353 ± 9% -22.1% 275 ± 14% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> 1183 ± 23% +77.7% 2102 ± 12% sched_debug.cfs_rq[1]:/.tg_load_contrib
> 181 ± 8% -31.4% 124 ± 26% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> 8364 ± 8% -31.3% 5745 ± 26% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> 8297 ± 9% +81.7% 15079 ± 12% sched_debug.cfs_rq[2]:/.tg_load_avg
> 30439 ± 13% -45.2% 16681 ± 26% sched_debug.cfs_rq[2]:/.exec_clock
> 39735 ± 14% -48.3% 20545 ± 29% sched_debug.cfs_rq[2]:/.min_vruntime
> 8231 ± 10% +82.2% 15000 ± 12% sched_debug.cfs_rq[3]:/.tg_load_avg
> 1210 ± 14% +110.3% 2546 ± 30% sched_debug.cfs_rq[4]:/.tg_load_contrib
> 8188 ± 10% +82.8% 14964 ± 12% sched_debug.cfs_rq[4]:/.tg_load_avg
> 8132 ± 10% +83.1% 14890 ± 12% sched_debug.cfs_rq[5]:/.tg_load_avg
> 749 ± 29% +205.9% 2292 ± 34% sched_debug.cfs_rq[5]:/.blocked_load_avg
> 963 ± 30% +169.9% 2599 ± 33% sched_debug.cfs_rq[5]:/.tg_load_contrib
> 37791 ± 32% -38.6% 23209 ± 13% sched_debug.cfs_rq[6]:/.min_vruntime
> 693 ± 25% +132.2% 1609 ± 29% sched_debug.cfs_rq[6]:/.blocked_load_avg
> 10838 ± 13% -39.2% 6587 ± 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> 29329 ± 27% -33.2% 19577 ± 10% sched_debug.cfs_rq[6]:/.exec_clock
> 235 ± 14% -39.7% 142 ± 14% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> 8085 ± 10% +83.6% 14848 ± 12% sched_debug.cfs_rq[6]:/.tg_load_avg
> 839 ± 25% +128.5% 1917 ± 18% sched_debug.cfs_rq[6]:/.tg_load_contrib
> 8051 ± 10% +83.6% 14779 ± 12% sched_debug.cfs_rq[7]:/.tg_load_avg
> 156 ± 34% +97.9% 309 ± 19% sched_debug.cpu#0.cpu_load[4]
> 160 ± 25% +64.0% 263 ± 16% sched_debug.cpu#0.cpu_load[2]
> 156 ± 32% +83.7% 286 ± 17% sched_debug.cpu#0.cpu_load[3]
> 164 ± 20% -35.1% 106 ± 31% sched_debug.cpu#2.cpu_load[0]
> 249 ± 15% +80.2% 449 ± 10% sched_debug.cpu#4.cpu_load[3]
> 231 ± 11% +101.2% 466 ± 13% sched_debug.cpu#4.cpu_load[2]
> 217 ± 14% +189.9% 630 ± 38% sched_debug.cpu#4.cpu_load[0]
> 71951 ± 5% +21.6% 87526 ± 7% sched_debug.cpu#4.nr_load_updates
> 214 ± 8% +146.1% 527 ± 27% sched_debug.cpu#4.cpu_load[1]
> 256 ± 17% +75.7% 449 ± 13% sched_debug.cpu#4.cpu_load[4]
> 209 ± 23% +98.3% 416 ± 48% sched_debug.cpu#5.cpu_load[2]
> 68024 ± 2% +18.8% 80825 ± 1% sched_debug.cpu#5.nr_load_updates
> 217 ± 26% +74.9% 380 ± 45% sched_debug.cpu#5.cpu_load[3]
> 852 ± 21% -38.3% 526 ± 22% sched_debug.cpu#6.curr->pid
>
> lkp-st02: Core2
> Memory: 8G
>
>
>
>
> perf-stat.cache-misses
>
> 1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> | O O O O O O O O O O |
> 1.4e+09 ++ |
> 1.2e+09 *+.*...* *..* * *...*..*...*..*...*..*...*..*...*..*
> | : : : : : |
> 1e+09 ++ : : : : : : |
> | : : : : : : |
> 8e+08 ++ : : : : : : |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.L1-dcache-prefetches
>
> 1.2e+09 ++----------------------------------------------------------------+
> *..*...* *..* * ..*.. ..*..*...*..*...*..*...*..*
> 1e+09 ++ : : : : *. *. |
> | : : : :: : |
> | : : : : : : O |
> 8e+08 O+ O: O :O O: O :O: O :O O O O O O O |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> | : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 2e+08 ++ :: :: : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.LLC-load-misses
>
> 1e+09 ++------------------------------------------------------------------+
> 9e+08 O+ O O O O O |
> | O O O O |
> 8e+08 ++ O O O O O O |
> 7e+08 ++ |
> | |
> 6e+08 *+..*..* *...* * *...*..*...*...*..*...*..*...*..*...*
> 5e+08 ++ : : : :: : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 3e+08 ++ : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : : : : |
> 1e+08 ++ : :: : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> perf-stat.context-switches
>
> 3e+06 ++----------------------------------------------------------------+
> | *...*..*... |
> 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... .*
> | : : : : : *. *. |
> O O: O :O O: O :: : O O O O O O |
> 2e+06 ++ : : : :O: O :O O |
> | : : : : : : |
> 1.5e+06 ++ : : : : : : |
> | : : : : : : |
> 1e+06 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 500000 ++ :: : : :: |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> vmstat.system.cs
>
> 10000 ++------------------------------------------------------------------+
> 9000 ++ *...*.. |
> *...*..* *...* * : *...*...*.. ..*..*...*.. ..*
> 8000 ++ : : : : : *. *. |
> 7000 O+ O: O O O: O : : : O O O O O O |
> | : : : :O: O :O O |
> 6000 ++ : : : : : : |
> 5000 ++ : : : : : : |
> 4000 ++ : : : : : : |
> | : : : : : : |
> 3000 ++ : : : : : : |
> 2000 ++ : : : : : : |
> | : : :: :: |
> 1000 ++ : : : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> apt-get install ruby
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/setup-local job.yaml # the job file attached in this email
> bin/run-local job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
>


Attachments:
(No filename) (811.00 B)
OpenPGP digital signature

2015-04-30 06:23:59

by Yuanhan Liu

[permalink] [raw]
Subject: Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5% perf-stat.LLC-load-misses

On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <[email protected]> wrote:
>
> > FYI, we noticed the below changes on
> >
> > git://neil.brown.name/md for-next
> > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
>
> Hi,
> is there any chance that you could explain what some of this means?
> There is lots of data and some very pretty graphs, but no explanation.

Hi Neil,

(Sorry for late response: Ying is on vacation)

I guess you can simply ignore this report, as I already reported to you
month ago that this patch made fsmark performs better in most cases:

https://lists.01.org/pipermail/lkp/2015-March/002411.html

>
> Which numbers are "good", which are "bad"? Which is "worst".
> What do the graphs really show? and what would we like to see in them?
>
> I think it is really great that you are doing this testing and reporting the
> results. It's just so sad that I completely fail to understand them.

Sorry, it's our bad to make them hard to understand as well as
to report a duplicate one(well, the commit hash is different ;).

We might need take some time to make those data understood easier.

--yliu

>
> >
> >
> > testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> >
> > a87d7f782b47e030 878ee6792799e2f88bdcac3298
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 59035 ? 0% +18.4% 69913 ? 1% softirqs.SCHED
> > 1330 ? 10% +17.4% 1561 ? 4% slabinfo.kmalloc-512.num_objs
> > 1330 ? 10% +17.4% 1561 ? 4% slabinfo.kmalloc-512.active_objs
> > 305908 ? 0% -1.8% 300427 ? 0% vmstat.io.bo
> > 1 ? 0% +100.0% 2 ? 0% vmstat.procs.r
> > 8266 ? 1% -15.7% 6968 ? 0% vmstat.system.cs
> > 14819 ? 0% -2.1% 14503 ? 0% vmstat.system.in
> > 18.20 ? 6% +10.2% 20.05 ? 4% perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> > 1.94 ? 9% +90.6% 3.70 ? 9% perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> > 0.00 ? 0% +Inf% 25.18 ? 3% perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> > 0.00 ? 0% +Inf% 14.14 ? 4% perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> > 1.79 ? 7% +102.9% 3.64 ? 9% perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> > 3.09 ? 4% -10.8% 2.76 ? 4% perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> > 0.80 ? 14% +28.1% 1.02 ? 10% perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> > 14.78 ? 6% -100.0% 0.00 ? 0% perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> > 25.68 ? 4% -100.0% 0.00 ? 0% perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> > 1.23 ? 5% +140.0% 2.96 ? 7% perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> > 2.62 ? 6% -95.6% 0.12 ? 33% perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> > 0.96 ? 9% +17.5% 1.12 ? 2% perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> > 1.461e+10 ? 0% -5.3% 1.384e+10 ? 1% perf-stat.L1-dcache-load-misses
> > 3.688e+11 ? 0% -2.7% 3.59e+11 ? 0% perf-stat.L1-dcache-loads
> > 1.124e+09 ? 0% -27.7% 8.125e+08 ? 0% perf-stat.L1-dcache-prefetches
> > 2.767e+10 ? 0% -1.8% 2.717e+10 ? 0% perf-stat.L1-dcache-store-misses
> > 2.352e+11 ? 0% -2.8% 2.287e+11 ? 0% perf-stat.L1-dcache-stores
> > 6.774e+09 ? 0% -2.3% 6.62e+09 ? 0% perf-stat.L1-icache-load-misses
> > 5.571e+08 ? 0% +40.5% 7.826e+08 ? 1% perf-stat.LLC-load-misses
> > 6.263e+09 ? 0% -13.7% 5.407e+09 ? 1% perf-stat.LLC-loads
> > 1.914e+11 ? 0% -4.2% 1.833e+11 ? 0% perf-stat.branch-instructions
> > 1.145e+09 ? 2% -5.6% 1.081e+09 ? 0% perf-stat.branch-load-misses
> > 1.911e+11 ? 0% -4.3% 1.829e+11 ? 0% perf-stat.branch-loads
> > 1.142e+09 ? 2% -5.1% 1.083e+09 ? 0% perf-stat.branch-misses
> > 1.218e+09 ? 0% +19.8% 1.46e+09 ? 0% perf-stat.cache-misses
> > 2.118e+10 ? 0% -5.2% 2.007e+10 ? 0% perf-stat.cache-references
> > 2510308 ? 1% -15.7% 2115410 ? 0% perf-stat.context-switches
> > 39623 ? 0% +22.1% 48370 ? 1% perf-stat.cpu-migrations
> > 4.179e+08 ? 40% +165.7% 1.111e+09 ? 35% perf-stat.dTLB-load-misses
> > 3.684e+11 ? 0% -2.5% 3.592e+11 ? 0% perf-stat.dTLB-loads
> > 1.232e+08 ? 15% +62.5% 2.002e+08 ? 27% perf-stat.dTLB-store-misses
> > 2.348e+11 ? 0% -2.5% 2.288e+11 ? 0% perf-stat.dTLB-stores
> > 3577297 ? 2% +8.7% 3888986 ? 1% perf-stat.iTLB-load-misses
> > 1.035e+12 ? 0% -3.5% 9.988e+11 ? 0% perf-stat.iTLB-loads
> > 1.036e+12 ? 0% -3.7% 9.978e+11 ? 0% perf-stat.instructions
> > 594 ? 30% +130.3% 1369 ? 13% sched_debug.cfs_rq[0]:/.blocked_load_avg
> > 17 ? 10% -28.2% 12 ? 23% sched_debug.cfs_rq[0]:/.nr_spread_over
> > 210 ? 21% +42.1% 298 ? 28% sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> > 9676 ? 21% +42.1% 13754 ? 28% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> > 772 ? 25% +116.5% 1672 ? 9% sched_debug.cfs_rq[0]:/.tg_load_contrib
> > 8402 ? 9% +83.3% 15405 ? 11% sched_debug.cfs_rq[0]:/.tg_load_avg
> > 8356 ? 9% +82.8% 15272 ? 11% sched_debug.cfs_rq[1]:/.tg_load_avg
> > 968 ? 25% +100.8% 1943 ? 14% sched_debug.cfs_rq[1]:/.blocked_load_avg
> > 16242 ? 9% -22.2% 12643 ? 14% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> > 353 ? 9% -22.1% 275 ? 14% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> > 1183 ? 23% +77.7% 2102 ? 12% sched_debug.cfs_rq[1]:/.tg_load_contrib
> > 181 ? 8% -31.4% 124 ? 26% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> > 8364 ? 8% -31.3% 5745 ? 26% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> > 8297 ? 9% +81.7% 15079 ? 12% sched_debug.cfs_rq[2]:/.tg_load_avg
> > 30439 ? 13% -45.2% 16681 ? 26% sched_debug.cfs_rq[2]:/.exec_clock
> > 39735 ? 14% -48.3% 20545 ? 29% sched_debug.cfs_rq[2]:/.min_vruntime
> > 8231 ? 10% +82.2% 15000 ? 12% sched_debug.cfs_rq[3]:/.tg_load_avg
> > 1210 ? 14% +110.3% 2546 ? 30% sched_debug.cfs_rq[4]:/.tg_load_contrib
> > 8188 ? 10% +82.8% 14964 ? 12% sched_debug.cfs_rq[4]:/.tg_load_avg
> > 8132 ? 10% +83.1% 14890 ? 12% sched_debug.cfs_rq[5]:/.tg_load_avg
> > 749 ? 29% +205.9% 2292 ? 34% sched_debug.cfs_rq[5]:/.blocked_load_avg
> > 963 ? 30% +169.9% 2599 ? 33% sched_debug.cfs_rq[5]:/.tg_load_contrib
> > 37791 ? 32% -38.6% 23209 ? 13% sched_debug.cfs_rq[6]:/.min_vruntime
> > 693 ? 25% +132.2% 1609 ? 29% sched_debug.cfs_rq[6]:/.blocked_load_avg
> > 10838 ? 13% -39.2% 6587 ? 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> > 29329 ? 27% -33.2% 19577 ? 10% sched_debug.cfs_rq[6]:/.exec_clock
> > 235 ? 14% -39.7% 142 ? 14% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> > 8085 ? 10% +83.6% 14848 ? 12% sched_debug.cfs_rq[6]:/.tg_load_avg
> > 839 ? 25% +128.5% 1917 ? 18% sched_debug.cfs_rq[6]:/.tg_load_contrib
> > 8051 ? 10% +83.6% 14779 ? 12% sched_debug.cfs_rq[7]:/.tg_load_avg
> > 156 ? 34% +97.9% 309 ? 19% sched_debug.cpu#0.cpu_load[4]
> > 160 ? 25% +64.0% 263 ? 16% sched_debug.cpu#0.cpu_load[2]
> > 156 ? 32% +83.7% 286 ? 17% sched_debug.cpu#0.cpu_load[3]
> > 164 ? 20% -35.1% 106 ? 31% sched_debug.cpu#2.cpu_load[0]
> > 249 ? 15% +80.2% 449 ? 10% sched_debug.cpu#4.cpu_load[3]
> > 231 ? 11% +101.2% 466 ? 13% sched_debug.cpu#4.cpu_load[2]
> > 217 ? 14% +189.9% 630 ? 38% sched_debug.cpu#4.cpu_load[0]
> > 71951 ? 5% +21.6% 87526 ? 7% sched_debug.cpu#4.nr_load_updates
> > 214 ? 8% +146.1% 527 ? 27% sched_debug.cpu#4.cpu_load[1]
> > 256 ? 17% +75.7% 449 ? 13% sched_debug.cpu#4.cpu_load[4]
> > 209 ? 23% +98.3% 416 ? 48% sched_debug.cpu#5.cpu_load[2]
> > 68024 ? 2% +18.8% 80825 ? 1% sched_debug.cpu#5.nr_load_updates
> > 217 ? 26% +74.9% 380 ? 45% sched_debug.cpu#5.cpu_load[3]
> > 852 ? 21% -38.3% 526 ? 22% sched_debug.cpu#6.curr->pid
> >
> > lkp-st02: Core2
> > Memory: 8G
> >
> >
> >
> >
> > perf-stat.cache-misses
> >
> > 1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> > | O O O O O O O O O O |
> > 1.4e+09 ++ |
> > 1.2e+09 *+.*...* *..* * *...*..*...*..*...*..*...*..*...*..*
> > | : : : : : |
> > 1e+09 ++ : : : : : : |
> > | : : : : : : |
> > 8e+08 ++ : : : : : : |
> > | : : : : : : |
> > 6e+08 ++ : : : : : : |
> > 4e+08 ++ : : : : : : |
> > | : : : : : : |
> > 2e+08 ++ : : : : : : |
> > | : : : |
> > 0 ++-O------*----------*------*-------------------------------------+
> >
> >
> > perf-stat.L1-dcache-prefetches
> >
> > 1.2e+09 ++----------------------------------------------------------------+
> > *..*...* *..* * ..*.. ..*..*...*..*...*..*...*..*
> > 1e+09 ++ : : : : *. *. |
> > | : : : :: : |
> > | : : : : : : O |
> > 8e+08 O+ O: O :O O: O :O: O :O O O O O O O |
> > | : : : : : : |
> > 6e+08 ++ : : : : : : |
> > | : : : : : : |
> > 4e+08 ++ : : : : : : |
> > | : : : : : : |
> > | : : : : : : |
> > 2e+08 ++ :: :: : : |
> > | : : : |
> > 0 ++-O------*----------*------*-------------------------------------+
> >
> >
> > perf-stat.LLC-load-misses
> >
> > 1e+09 ++------------------------------------------------------------------+
> > 9e+08 O+ O O O O O |
> > | O O O O |
> > 8e+08 ++ O O O O O O |
> > 7e+08 ++ |
> > | |
> > 6e+08 *+..*..* *...* * *...*..*...*...*..*...*..*...*..*...*
> > 5e+08 ++ : : : :: : |
> > 4e+08 ++ : : : : : : |
> > | : : : : : : |
> > 3e+08 ++ : : : : : : |
> > 2e+08 ++ : : : : : : |
> > | : : : : : : |
> > 1e+08 ++ : :: : |
> > 0 ++--O------*---------*-------*--------------------------------------+
> >
> >
> > perf-stat.context-switches
> >
> > 3e+06 ++----------------------------------------------------------------+
> > | *...*..*... |
> > 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... .*
> > | : : : : : *. *. |
> > O O: O :O O: O :: : O O O O O O |
> > 2e+06 ++ : : : :O: O :O O |
> > | : : : : : : |
> > 1.5e+06 ++ : : : : : : |
> > | : : : : : : |
> > 1e+06 ++ : : : : : : |
> > | : : : : : : |
> > | : : : : : : |
> > 500000 ++ :: : : :: |
> > | : : : |
> > 0 ++-O------*----------*------*-------------------------------------+
> >
> >
> > vmstat.system.cs
> >
> > 10000 ++------------------------------------------------------------------+
> > 9000 ++ *...*.. |
> > *...*..* *...* * : *...*...*.. ..*..*...*.. ..*
> > 8000 ++ : : : : : *. *. |
> > 7000 O+ O: O O O: O : : : O O O O O O |
> > | : : : :O: O :O O |
> > 6000 ++ : : : : : : |
> > 5000 ++ : : : : : : |
> > 4000 ++ : : : : : : |
> > | : : : : : : |
> > 3000 ++ : : : : : : |
> > 2000 ++ : : : : : : |
> > | : : :: :: |
> > 1000 ++ : : : |
> > 0 ++--O------*---------*-------*--------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> > To reproduce:
> >
> > apt-get install ruby
> > git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > cd lkp-tests
> > bin/setup-local job.yaml # the job file attached in this email
> > bin/run-local job.yaml
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Ying Huang
> >
>