2016-10-09 07:05:59

by kernel test robot

[permalink] [raw]
Subject: [lkp] [ipc/sem.c] 0882cba0a0: aim9.shared_memory.ops_per_sec -8.8% regression


FYI, we noticed a -8.8% regression of aim9.shared_memory.ops_per_sec due to commit:

commit 0882cba0a03bca73acd8fab8fb50db04691908e9 ("ipc/sem.c: fix complex_count vs. simple op race")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master

in testcase: aim9
on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
with following parameters:

testtime: 300s
test: shared_memory
cpufreq_governor: performance

Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic benchmark.



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/shared_memory/aim9/300s

commit:
bfe738a6c2 ("config: android: enable CONFIG_SECCOMP")
0882cba0a0 ("ipc/sem.c: fix complex_count vs. simple op race")

bfe738a6c274df47 0882cba0a03bca73acd8fab8fb
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4 kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#]
%stddev %change %stddev
\ | \
1377511 ? 0% -8.8% 1256420 ? 0% aim9.shared_memory.ops_per_sec
4133531 ? 0% -8.8% 3770259 ? 0% aim9.time.minor_page_faults
9.66 ? 0% -2.0% 9.46 ? 0% turbostat.CorWatt
7.90 ? 26% -30.4% 5.50 ? 37% sched_debug.rt_rq:/.rt_time.avg
31.58 ? 26% -30.4% 21.97 ? 37% sched_debug.rt_rq:/.rt_time.max
13.67 ? 26% -30.5% 9.51 ? 37% sched_debug.rt_rq:/.rt_time.stddev
2374 ? 5% -12.4% 2078 ? 1% slabinfo.kmalloc-256.active_objs
697.75 ? 4% +33.5% 931.25 ? 5% slabinfo.kmalloc-512.active_objs
719.50 ? 3% +32.3% 951.75 ? 5% slabinfo.kmalloc-512.num_objs
5184 ? 2% -12.3% 4544 ? 8% slabinfo.kmalloc-64.active_objs
5184 ? 2% -12.3% 4544 ? 8% slabinfo.kmalloc-64.num_objs
10295 ? 6% -8.9% 9376 ? 1% perf-stat.cpu-migrations
0.04 ? 24% +59.5% 0.06 ? 51% perf-stat.dTLB-load-miss-rate%
5.346e+11 ? 3% -13.1% 4.647e+11 ? 7% perf-stat.dTLB-loads
1.61 ? 1% -9.2% 1.46 ? 1% perf-stat.ipc
4417053 ? 0% -8.2% 4053763 ? 0% perf-stat.minor-faults
4417054 ? 0% -8.2% 4053763 ? 0% perf-stat.page-faults
3.17 ? 9% -26.7% 2.32 ? 13% perf-profile.calltrace.cycles-pp._raw_spin_lock.sys_semop.entry_SYSCALL_64_fastpath
1.16 ? 4% -15.9% 0.98 ? 14% perf-profile.calltrace.cycles-pp.do_munmap.sys_shmdt.entry_SYSCALL_64_fastpath
5.25 ? 8% -15.7% 4.43 ? 4% perf-profile.calltrace.cycles-pp.do_smart_update.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
1.50 ? 5% +68.6% 2.53 ? 10% perf-profile.calltrace.cycles-pp.pid_vnr.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
1.24 ? 4% -15.3% 1.05 ? 12% perf-profile.calltrace.cycles-pp.sys_shmdt.entry_SYSCALL_64_fastpath
0.99 ? 12% -20.5% 0.79 ? 6% perf-profile.calltrace.cycles-pp.wake_const_ops.do_smart_wakeup_zero.do_smart_update.SYSC_semtimedop.sys_semop
3.45 ? 9% -24.8% 2.60 ? 12% perf-profile.children.cycles-pp._raw_spin_lock
1.18 ? 4% -15.7% 0.99 ? 14% perf-profile.children.cycles-pp.do_munmap
5.61 ? 8% -15.4% 4.75 ? 3% perf-profile.children.cycles-pp.do_smart_update
1.78 ? 5% +56.7% 2.80 ? 11% perf-profile.children.cycles-pp.pid_vnr
1.24 ? 4% -15.3% 1.05 ? 12% perf-profile.children.cycles-pp.sys_shmdt
1.24 ? 13% -24.7% 0.93 ? 8% perf-profile.children.cycles-pp.wake_const_ops
9.86 ? 8% +47.0% 14.49 ? 9% perf-profile.self.cycles-pp.SYSC_semtimedop
3.45 ? 9% -25.2% 2.58 ? 13% perf-profile.self.cycles-pp._raw_spin_lock
2.26 ? 9% -26.8% 1.65 ? 7% perf-profile.self.cycles-pp.do_smart_update
1.78 ? 5% +56.7% 2.80 ? 11% perf-profile.self.cycles-pp.pid_vnr
1.03 ? 2% -15.3% 0.87 ? 8% perf-profile.self.cycles-pp.security_ipc_permission
1.24 ? 13% -24.7% 0.93 ? 8% perf-profile.self.cycles-pp.wake_const_ops



aim9.time.minor_page_faults

4.2e+06 ++---------------------------------------------------------------+
4.15e+06 ** ***. *.* **.* .** *.** ** |
| + : * * * *.** : :: * * :: * .* : ***.***
4.1e+06 ++ * *: :** * * * * * * * * |
4.05e+06 ++ :: :.* : : * : |
| :: * * : : : |
4e+06 ++ * : : : |
3.95e+06 ++ * ** |
3.9e+06 ++ |
| |
3.85e+06 ++ |
3.8e+06 ++ O |
| OO OO O O O |
3.75e+06 OO O O OOO O |
3.7e+06 ++-O-------------------------------------------------------------+

[*] bisect-good sample
[O] bisect-bad sample





Thanks,
Xiaolong


Attachments:
(No filename) (6.18 kB)
config-4.8.0-rc7-00280-g0882cba (148.99 kB)
job-script (6.17 kB)
job.yaml (3.83 kB)
reproduce (102.00 B)
Download all attachments

2016-10-09 17:30:20

by Manfred Spraul

[permalink] [raw]
Subject: Re: [lkp] [ipc/sem.c] 0882cba0a0: aim9.shared_memory.ops_per_sec -8.8% regression

Hi,

On 10/09/2016 09:05 AM, kernel test robot wrote:
> FYI, we noticed a -8.8% regression of aim9.shared_memory.ops_per_sec due to commit:
>
> commit 0882cba0a03bca73acd8fab8fb50db04691908e9 ("ipc/sem.c: fix complex_count vs. simple op race")
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>
> in testcase: aim9
> on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
> with following parameters:
>
> testtime: 300s
> test: shared_memory
> cpufreq_governor: performance
>
> Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic benchmark.
>
>
It is probably caused by this change:
> --- ipc/sem-fast-but-wrong.c 2016-10-09 19:24:47.914825410 +0200
> +++ ipc/sem.c 2016-10-09 19:24:57.960841540 +0200
> @@ -363,6 +363,14 @@
> */
> spin_lock(&sem->lock);
>
> + /*
> + * See 51d7d5205d33
> + * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
> + * A full barrier is required: the write of sem->lock
> + * must be visible before the read is executed
> + */
> + smp_mb();
> +
> if (!smp_load_acquire(&sma->complex_mode)) {
> /* fast path successful! */
> return sops->sem_num;
Unfortunately, we need it, at least for powerpc.
And I do not want to add a CONFIG_PPC into ipc/sem.c.

Is it possible to do a test what happens with patch that avoids
spin_unlock_wait()?

https://patchwork.kernel.org/patch/9359365/
https://patchwork.kernel.org/patch/9359371/

(if possible, with both patches together)

Thanks,
Manfred

2016-10-10 06:26:54

by kernel test robot

[permalink] [raw]
Subject: Re: [lkp] [ipc/sem.c] 0882cba0a0: aim9.shared_memory.ops_per_sec -8.8% regression

On 10/09, Manfred Spraul wrote:
>Hi,
>
>On 10/09/2016 09:05 AM, kernel test robot wrote:
>>FYI, we noticed a -8.8% regression of aim9.shared_memory.ops_per_sec due to commit:
>>
>>commit 0882cba0a03bca73acd8fab8fb50db04691908e9 ("ipc/sem.c: fix complex_count vs. simple op race")
>>https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>>
>>in testcase: aim9
>>on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
>>with following parameters:
>>
>> testtime: 300s
>> test: shared_memory
>> cpufreq_governor: performance
>>
>>Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic benchmark.
>>
>>
>It is probably caused by this change:
>>--- ipc/sem-fast-but-wrong.c 2016-10-09 19:24:47.914825410 +0200
>>+++ ipc/sem.c 2016-10-09 19:24:57.960841540 +0200
>>@@ -363,6 +363,14 @@
>> */
>> spin_lock(&sem->lock);
>>
>>+ /*
>>+ * See 51d7d5205d33
>>+ * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
>>+ * A full barrier is required: the write of sem->lock
>>+ * must be visible before the read is executed
>>+ */
>>+ smp_mb();
>>+
>> if (!smp_load_acquire(&sma->complex_mode)) {
>> /* fast path successful! */
>> return sops->sem_num;
>Unfortunately, we need it, at least for powerpc.
>And I do not want to add a CONFIG_PPC into ipc/sem.c.
>
>Is it possible to do a test what happens with patch that avoids
>spin_unlock_wait()?
>
>https://patchwork.kernel.org/patch/9359365/
>https://patchwork.kernel.org/patch/9359371/
>
>(if possible, with both patches together)

I applied these two patches on top of 0882cba ("ipc/sem.c: fix
complex_count vs. simple op race"), seems they do help recover the
performance back, here is the comparison:


=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/shared_memory/aim9/300s

commit:
bfe738a6c274df47650d09360babc8c1b71ff2a8
0882cba0a03bca73acd8fab8fb50db04691908e9
cc5f1c2b5da85fdf3de370189765af88c8df7b55 ("ipc/sem: Add hysteresis")

bfe738a6c274df47 0882cba0a03bca73acd8fab8fb cc5f1c2b5da85fdf3de3701897
---------------- -------------------------- --------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
%stddev %change %stddev %change %stddev
\ | \ | \
1377511 ? 0% -8.8% 1256420 ? 0% -0.5% 1370192 ? 0% aim9.shared_memory.ops_per_sec
4133531 ? 0% -8.8% 3770259 ? 0% -0.5% 4111578 ? 0% aim9.time.minor_page_faults
8106 ? 6% -7.7% 7483 ? 8% +24.5% 10092 ? 7% cpuidle.POLL.usage
9.66 ? 0% -2.0% 9.46 ? 0% +0.5% 9.71 ? 0% turbostat.CorWatt
2374 ? 5% -12.4% 2078 ? 1% -5.5% 2243 ? 5% slabinfo.kmalloc-256.active_objs
697.75 ? 4% +33.5% 931.25 ? 5% +43.5% 1001 ? 4% slabinfo.kmalloc-512.active_objs
719.50 ? 3% +32.3% 951.75 ? 5% +41.8% 1020 ? 4% slabinfo.kmalloc-512.num_objs
5184 ? 2% -12.3% 4544 ? 8% -5.9% 4877 ? 8% slabinfo.kmalloc-64.active_objs
5184 ? 2% -12.3% 4544 ? 8% -5.9% 4877 ? 8% slabinfo.kmalloc-64.num_objs
2.124e+09 ? 3% +0.9% 2.144e+09 ? 9% +14.8% 2.439e+09 ? 9% perf-stat.cache-references
10295 ? 6% -8.9% 9376 ? 1% +6.0% 10912 ? 7% perf-stat.cpu-migrations
0.04 ? 24% +59.5% 0.06 ? 51% +508.5% 0.24 ?123% perf-stat.dTLB-load-miss-rate%
5.346e+11 ? 3% -13.1% 4.647e+11 ? 7% +2.9% 5.501e+11 ? 8% perf-stat.dTLB-loads
1.61 ? 1% -9.2% 1.46 ? 1% -0.6% 1.60 ? 1% perf-stat.ipc
4417053 ? 0% -8.2% 4053763 ? 0% -0.5% 4394973 ? 0% perf-stat.minor-faults
4417054 ? 0% -8.2% 4053763 ? 0% -0.5% 4394973 ? 0% perf-stat.page-faults
4478439 ?144% +83.7% 8227939 ? 90% -87.4% 564387 ? 21% sched_debug.cfs_rq:/.load.max
1912666 ?145% +84.7% 3531796 ? 91% -87.7% 235296 ? 19% sched_debug.cfs_rq:/.load.stddev
755.19 ? 79% +4.3% 787.48 ? 63% -55.2% 338.70 ? 15% sched_debug.cfs_rq:/.load_avg.avg
186.08 ? 9% -10.1% 167.38 ? 15% -39.9% 111.77 ? 35% sched_debug.cfs_rq:/.load_avg.min
86.83 ? 22% -8.6% 79.38 ? 31% -38.0% 53.83 ? 27% sched_debug.cfs_rq:/.util_avg.min
222488 ? 2% +0.2% 222984 ? 6% -22.8% 171691 ? 15% sched_debug.cpu.avg_idle.stddev
62.58 ? 56% -18.7% 50.90 ? 41% -93.8% 3.86 ?168% sched_debug.cpu.clock.stddev
62.58 ? 56% -18.7% 50.90 ? 41% -93.8% 3.86 ?168% sched_debug.cpu.clock_task.stddev
4500146 ?144% +82.7% 8220704 ? 92% -87.5% 564387 ? 21% sched_debug.cpu.load.max
1920553 ?145% +83.6% 3525404 ? 92% -87.7% 235296 ? 19% sched_debug.cpu.load.stddev
1.52 ? 11% -26.7% 1.11 ? 33% -24.9% 1.14 ? 18% sched_debug.cpu.nr_running.avg
18404 ? 42% +39.9% 25755 ? 11% +66.9% 30725 ? 7% sched_debug.cpu.ttwu_count.min
7.90 ? 26% -30.4% 5.50 ? 37% -13.0% 6.88 ? 45% sched_debug.rt_rq:/.rt_time.avg
31.58 ? 26% -30.4% 21.97 ? 37% -13.2% 27.42 ? 45% sched_debug.rt_rq:/.rt_time.max
13.67 ? 26% -30.5% 9.51 ? 37% -13.2% 11.86 ? 45% sched_debug.rt_rq:/.rt_time.stddev
3.17 ? 9% -26.7% 2.32 ? 13% +2.0% 3.23 ? 12% perf-profile.calltrace.cycles-pp._raw_spin_lock.sys_semop.entry_SYSCALL_64_fastpath
1.16 ? 4% -15.9% 0.98 ? 14% +1.0% 1.17 ? 11% perf-profile.calltrace.cycles-pp.do_munmap.sys_shmdt.entry_SYSCALL_64_fastpath
5.25 ? 8% -15.7% 4.43 ? 4% -2.3% 5.13 ? 15% perf-profile.calltrace.cycles-pp.do_smart_update.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
1.50 ? 5% +68.6% 2.53 ? 10% -36.2% 0.96 ? 28% perf-profile.calltrace.cycles-pp.pid_vnr.SYSC_semtimedop.sys_semop.entry_SYSCALL_64_fastpath
1.24 ? 4% -15.3% 1.05 ? 12% +0.6% 1.25 ? 11% perf-profile.calltrace.cycles-pp.sys_shmdt.entry_SYSCALL_64_fastpath
0.99 ? 12% -20.5% 0.79 ? 6% -5.7% 0.93 ? 17% perf-profile.calltrace.cycles-pp.wake_const_ops.do_smart_wakeup_zero.do_smart_update.SYSC_semtimedop.sys_semop
3.45 ? 9% -24.8% 2.60 ? 12% +1.7% 3.51 ? 13% perf-profile.children.cycles-pp._raw_spin_lock
1.18 ? 4% -15.7% 0.99 ? 14% +0.9% 1.19 ? 12% perf-profile.children.cycles-pp.do_munmap
5.61 ? 8% -15.4% 4.75 ? 3% -3.7% 5.40 ? 15% perf-profile.children.cycles-pp.do_smart_update
1.78 ? 5% +56.7% 2.80 ? 11% -29.1% 1.27 ? 22% perf-profile.children.cycles-pp.pid_vnr
1.24 ? 4% -15.3% 1.05 ? 12% +0.8% 1.25 ? 11% perf-profile.children.cycles-pp.sys_shmdt
1.24 ? 13% -24.7% 0.93 ? 8% -12.7% 1.08 ? 16% perf-profile.children.cycles-pp.wake_const_ops
9.86 ? 8% +47.0% 14.49 ? 9% -8.5% 9.02 ? 13% perf-profile.self.cycles-pp.SYSC_semtimedop
3.45 ? 9% -25.2% 2.58 ? 13% +1.9% 3.51 ? 13% perf-profile.self.cycles-pp._raw_spin_lock
2.26 ? 9% -26.8% 1.65 ? 7% -11.0% 2.01 ? 18% perf-profile.self.cycles-pp.do_smart_update
1.78 ? 5% +56.7% 2.80 ? 11% -29.1% 1.27 ? 22% perf-profile.self.cycles-pp.pid_vnr
1.03 ? 2% -15.3% 0.87 ? 8% -8.7% 0.94 ? 25% perf-profile.self.cycles-pp.security_ipc_permission
1.32 ? 5% -4.0% 1.27 ? 14% +14.8% 1.51 ? 8% perf-profile.self.cycles-pp.security_sem_semop
1.24 ? 13% -24.7% 0.93 ? 8% -12.7% 1.08 ? 16% perf-profile.self.cycles-pp.wake_const_ops

Thanks,
Xiaolong
>
>Thanks,
> Manfred