2017-03-07 03:22:54

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [sched/fair] 4c77b18cf8: hackbench.throughput -14.4% regression

Greeting,

FYI, we noticed a -14.4% regression of hackbench.throughput due to commit:


commit: 4c77b18cf8b7ab37c7d5737b4609010d2ceec5f0 ("sched/fair: Make select_idle_cpu() more aggressive")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: hackbench
on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:

nr_threads: 50%
mode: process
ipc: pipe
cpufreq_governor: performance

test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c

In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_tps -33.8% regression |
| test machine | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | ip=ipv4 |
| | nr_threads=200% |
| | runtime=300s |
| | test=SCTP_RR |
+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_tps -50.8% regression |
| test machine | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | ip=ipv4 |
| | nr_threads=200% |
| | runtime=300s |
| | test=TCP_RR |
+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps -8.7% regression |
| test machine | 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | ip=ipv4 |
| | nr_threads=200% |
| | runtime=300s |
| | send_size=10K |
| | test=SCTP_STREAM_MANY |
+------------------+-----------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput 12.1% improvement |
| test machine | 8 threads Ivy Bridge with 16G memory |
| test parameters | cpufreq_governor=performance |
| | ipc=pipe |
| | mode=process |
| | nr_threads=50% |
+------------------+-----------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps -2.5% regression |
| test machine | 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory |
| test parameters | cluster=cs-localhost |
| | cpufreq_governor=performance |
| | ip=ipv4 |
| | nr_threads=200% |
| | runtime=300s |
| | send_size=10K |
| | test=SCTP_STREAM_MANY |
+------------------+-----------------------------------------------------------------------+


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

testcase/path_params/tbox_group/run: hackbench/50%-process-pipe-performance/ivb42

4977ab6e92e267af 4c77b18cf8b7ab37c7d5737b46
---------------- --------------------------
179106 -14% 153395 hackbench.throughput
5.036e+08 21% 6.113e+08 hackbench.time.involuntary_context_switches
4523 3% 4675 hackbench.time.percent_of_cpu_this_job_got
27089 3% 27956 hackbench.time.system_time
1394 -10% 1252 hackbench.time.user_time
2.501e+09 -11% 2.223e+09 hackbench.time.voluntary_context_switches
779669 -14% 667478 hackbench.time.minor_page_faults
319399 3% 329894 interrupts.CAL:Function_call_interrupts
884938 -22% 692644 vmstat.system.in
5224554 -9% 4736985 vmstat.system.cs
2880 2955 turbostat.Avg_MHz
96.25 98.77 turbostat.%Busy
6.59 -14% 5.63 turbostat.RAMWatt
2.009e+08 98% 3.986e+08 perf-stat.cpu-migrations
0.67 8% 0.73 perf-stat.branch-miss-rate%
5.046e+11 13% 5.722e+11 perf-stat.cache-references
5.897e+10 5% 6.22e+10 perf-stat.branch-misses
3851 16% 4471 perf-stat.instructions-per-iTLB-miss
38.80 -11% 34.53 perf-stat.node-store-miss-rate%
8.697e+13 8.833e+13 perf-stat.cpu-cycles
1928944 -8% 1777815 perf-stat.page-faults
1928944 -8% 1777789 perf-stat.minor-faults
1.332e+10 ? 3% -18% 1.098e+10 ? 16% perf-stat.dTLB-store-misses
1.87 ? 4% -20% 1.50 ? 19% perf-stat.dTLB-load-miss-rate%
2.654e+11 ? 4% -25% 1.988e+11 ? 20% perf-stat.dTLB-load-misses
0.53 -6% 0.50 perf-stat.ipc
8.738e+12 8.565e+12 perf-stat.branch-instructions
3.299e+09 -10% 2.968e+09 perf-stat.context-switches
4.586e+13 -4% 4.398e+13 perf-stat.instructions
64.05 31% 84.13 perf-stat.iTLB-load-miss-rate%
1.392e+13 -6% 1.306e+13 perf-stat.dTLB-loads
8.613e+12 -10% 7.773e+12 perf-stat.dTLB-stores
1.135e+10 ? 4% -45% 6.254e+09 ? 4% perf-stat.node-loads
1.878e+10 ? 3% -46% 1.016e+10 ? 4% perf-stat.cache-misses
1.1e+10 ? 4% -46% 5.949e+09 ? 4% perf-stat.node-load-misses
1.191e+10 -17% 9.836e+09 perf-stat.iTLB-load-misses
7.431e+09 ? 4% -48% 3.875e+09 ? 4% perf-stat.node-stores
3.72 ? 4% -52% 1.78 ? 4% perf-stat.cache-miss-rate%
4.711e+09 ? 4% -57% 2.044e+09 ? 3% perf-stat.node-store-misses
6.682e+09 -72% 1.856e+09 ? 3% perf-stat.iTLB-loads


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang


Attachments:
(No filename) (8.50 kB)
config-4.10.0-11074-g4c77b18 (153.61 kB)
job-script (6.42 kB)
job.yaml (4.11 kB)
reproduce (969.00 B)
Download all attachments

2017-03-07 08:34:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [lkp-robot] [sched/fair] 4c77b18cf8: hackbench.throughput -14.4% regression

On Tue, Mar 07, 2017 at 11:18:36AM +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -14.4% regression of hackbench.throughput due to commit:

Yeah, I know ... this patch is a mixed bag, some like it, some hate it.

But given it was fingered by a human doing desktopy things that trumps
artificial benchmark.

Still, I'll try and see if I can fix thing once I find a spare moment.