LinuxLists.cc - [LKP] [driver core] 570d020012: will-it-scale.per_thread

2019-02-18 08:11:57

Subject: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

Greeting,

FYI, we noticed a

commit: 570d0200123fb4f809 In addition to that,
+------------------+------ | testcase: change | test machine | test parameters | | mode=thread | | nr_task=100% | | test=signal1 +------------------+------ | testcase: change | test machine | test parameters | | mode=thread | | nr_task=100% | | test=open1 +------------------+------

Details are as below:
--------------------------

To reproduce:

git clone bin/lkp install job.yaml bin/lkp run job.yaml

========================== compiler/cpufreq_governor/ gcc-7/performance/x86_64
commit:
4bd4e92cfe ("sysfs: 570d020012 ("driver
4bd4e92cfe6d2af7 ---------------- %stddev %change \ | 937.00 906625 269989 17104244 6886 -1.7% 2.088e+08 ± 17% 683051 ± 15% 4.519e+08 ± 6% 648490 ± 14% 0.23 ± 18% -0.1 0.42 ± 9% +0.1 0.41 ± 5% -18.1% 0.30 ± 6% +27.9% 5.22 ± 13% -1.9 5.27 ± 13% -1.9 4.71 ± 14% -1.8 4.58 ± 15% -1.8 4.56 ± 15% -1.8 4.40 ± 15% -1.8 3.18 ± 23% -1.6 1.70 ± 19% -0.7 1.47 ± 20% -0.7 1.43 ± 23% -0.7 1.57 ± 5% -0.5 0.58 -0.1 2.31 ± 16% +0.7 2.04 ± 17% +0.7 2.09 ± 17% +0.7 44.55 +1.7 44.26 +1.8 44.24 +1.8 43.67 +1.8 43.80 +1.9 43.67 +1.9 43.11 +2.0 43.17 +2.0 43.08 +2.1 43.13 +2.1 12.29 -10.0% 8.564e+09 1.59 -0.1 1.356e+08 16.95 +1.1 74122780 4.377e+08 6523 -1.6% 12.02 5778 +6.2% 0.33 -0.0 1.185e+08 -15.6% 3.555e+10 3.563e+10 300.74 0.08 -1.6% 12.28 -10.0% 1.58 -0.1 16.94 +1.1 12.02 5776 +6.2% 0.33 -0.0 300.68 0.08 -1.6% 39542686 8.53e+09 1.351e+08 73835513 4.36e+08 6498 -1.7% 1.18e+08 3.541e+10 3.549e+10 1.067e+13 595740 ± 18% 4833300 ± 8% 15804 ± 7% -28.3% 1096777 108529 ± 4% -27.8% 12.36 ± 3% -11.3% 522.40 ± 8% 35.08 ± 7% -28.4% 595740 ± 18% 4833300 ± 8% 0.12 ± 9% -21.9% 77.95 ± 6% 7.10 ± 2% +20.5% 4.39 -15.0% 468.00 28.56 -35.1% 15797 ± 7% -28.3% 1096533 108522 ± 4% -27.8% 772.59 ± 6% 772.58 ± 6% 4.43 -15.1% 28.58 -33.9% 4.51 ± 3% -16.6% 29.07 ± 6% -35.8% 4.45 ± 2% -15.7% 28.36 ± 4% -34.7% 4.40 -15.0% 27.95 ± 3% -34.1% 4.34 ± 2% -14.3% 456.00 ± 3% 27.74 ± 2% -33.6% 15602 ± 7% -20.4% 106214 ± 3% -19.2% 0.00 ± 4% -24.5% 0.50 ± 26% -54.8% 143.90 ± 26% -54.8% 8.46 ± 26% -54.8%

1000 +-+---------------- | |.. +.. .+ +.. 950 +-+ + +. + .. | + + + | 900 +-+ | 850 +-+ O O | 800 +-+O O O | O | O 750 +-+---------------- [*] bisect-good sample
[O] bisect-bad sample

************************** lkp-knm01: 288 threads ========================== compiler/cpufreq_governor/ gcc-7/performance/x86_64
commit:
4bd4e92cfe ("sysfs: 570d020012 ("driver
4bd4e92cfe6d2af7 ---------------- fail:runs %reproduction | | :4 25% %stddev %change \ | 266.75 ± 2% 131.40 ± 6% -29.3% 76931 ± 2% -29.9% 788152 ± 8% 33255 ± 4% 34856762 0.00 ± 26% +0.0 0.28 ± 3% -0.0 981.00 ± 7% 981.00 ± 7% 714927 ± 11% 1.71 ± 5% +39.9% 1121 ± 2% 20.90 ± 11% +42.0% 54.08 ± 4% +18.2% 349403 ± 73% 860.45 ± 4% 860.45 ± 4% 5355 ± 14% +21.7% 0.00 ± 3% -36.6% 16.16 -0.2 16.14 -0.2 32.16 -0.1 16.12 -0.1 16.13 -0.1 32.17 -0.1 16.11 -0.1 16.12 -0.1 16.63 -0.1 16.49 -0.1 16.57 -0.1 16.58 -0.1 16.58 -0.1 16.57 -0.1 16.52 -0.1 16.63 -0.1 16.65 -0.1 16.52 -0.1 0.59 ± 4% +0.1 0.64 ± 5% +0.1 0.27 ±100% +0.3 4.55 ± 2% -8.7% 0.52 -0.1 47148684 35569495 ± 2% 1.66e+08 12.38 12726 ± 2% 0.09 ± 4% -0.0 32470934 ± 5% 1128 ± 4% +18.6% 4.54 ± 2% -8.7% 0.52 -0.1 12.38 12722 ± 2% 0.09 ± 4% -0.0 1127 ± 4% +18.5% 1.414e+08 ± 2% 46584988 35172756 ± 2% 1.641e+08 32116706 ± 5%

************************** lkp-knm01: 288 threads ========================== compiler/cpufreq_governor/ gcc-7/performance/x86_64
commit:
4bd4e92cfe ("sysfs: 570d020012 ("driver
4bd4e92cfe6d2af7 ---------------- %stddev %change \ | 945.00 568697 ± 7% 303.55 ± 2% 272400 0.08 ± 9% -0.0 192099 5043716 2.31 ± 2% +11.2% 4620 ± 6% -19.2% 5458 +1.4% 5458 +1.4% 7.27 ± 4% -0.8 6.02 ± 5% -0.6 1.22 ± 2% -0.2 1.16 ± 2% -0.1 50.11 -0.1 47.70 +0.2 47.89 +0.2 306780 ± 31% 37943697 ± 12% 3199728 ± 21% 1368 ± 7% -17.0% 10913 ± 25% -29.9% 76190 ± 22% -21.5% 306780 ± 31% 37943697 ± 12% 3199728 ± 21% 10906 ± 25% -29.9% 76186 ± 22% -21.5% 234073 ± 39% +134.5% 1117 ± 11% 1117 ± 11% 0.00 ± 8% -25.6% 2333 ± 3% -14.3% 1323 ± 4% -20.1% 2160 ± 4% -12.5% 1179 ± 5% 1.55 ± 66% -100.0% 912.99 ± 4% 590.45 ± 5% 835.45 ± 5% 561.95 ± 5% 6.47 -8.2% 8.137e+09 0.85 -0.1 69084658 44409176 2.151e+08 4250 ± 7% -19.1% 13.05 9785 0.20 -0.0 67639103 3.324e+10 3.328e+10 492.13 0.08 +1.8% 6.46 -8.3% 0.85 -0.1 13.05 9782 0.20 -0.0 492.04 0.08 +1.8% 36640385 8.103e+09 68811380 44229668 2.142e+08 4233 ± 7% -19.1% 67364890 3.315e+10 9.98e+12

Disclaimer:
Results have been for informational design or configuration

Thanks,
Rong Chen
-12.2% regression of will-it-scale.per_thread_ops due to commit:
aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
el.org/cgit/linux/kernel/git/next/linux-next.git">https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
288 threads Knights Mill with 80G memory
Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
//github.com/antonblanchard/will-it-scale">https://github.com/antonblanchard/will-it-scale
the commit also has significant impact on the following tests:
---------------------------------------------------------+
| will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
| 288 threads Knights Mill with 80G memory |
| cpufreq_governor=performance |
|
|
|
---------------------------------------------------------+
| will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
| 288 threads Knights Mill with 80G memory |
| cpufreq_governor=performance |
|
|
|
---------------------------------------------------------+
------------------------------------------------------------------------>
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
# job file is attached in this email
===============================================================
kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-knm01/unlink2/will-it-scale
fix blank line coding style warning")
core: move device->knode_class to device_private")
570d0200123fb4f809aa2f6226
--------------------------
%stddev
\
-12.2% 823.00 will-it-scale.per_thread_ops
-1.7% 891438 will-it-scale.time.involuntary_context_switches
-12.2% 237096 will-it-scale.workload
-12.3% 15001702 softirqs.RCU
6768 vmstat.system.cs
-56.9% 90059912 ± 57% cpuidle.C1.time
-42.5% 392628 ± 40% cpuidle.C1.usage
+27.8% 5.777e+08 ± 9% cpuidle.C6.time
-42.7% 371724 ± 43% turbostat.C1
0.10 ± 62% turbostat.C1%
0.55 ± 11% turbostat.C6%
0.34 ± 16% turbostat.CPU%c1
0.39 ± 13% turbostat.CPU%c6
3.35 ± 4% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.40 ± 3% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_open
2.92 ± 4% perf-profile.calltrace.cycles-pp.unlink
2.81 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
2.80 ± 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.64 ± 5% perf-profile.calltrace.cycles-pp.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
1.53 ± 9% perf-profile.calltrace.cycles-pp.shmem_mknod.path_openat.do_filp_open.do_sys_open.do_syscall_64
0.95 ± 14% perf-profile.calltrace.cycles-pp.shmem_get_inode.shmem_mknod.path_openat.do_filp_open.do_sys_open
0.76 ± 18% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
0.73 ± 18% perf-profile.calltrace.cycles-pp.new_inode.shmem_get_inode.shmem_mknod.path_openat.do_filp_open
1.04 ± 57% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
0.52 ± 2% perf-profile.calltrace.cycles-pp.filename_parentat.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
3.00 ± 9% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
2.76 ± 7% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
2.82 ± 7% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt
46.29 perf-profile.calltrace.cycles-pp.__GI___libc_close
46.07 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__GI___libc_close
46.05 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close
45.49 perf-profile.calltrace.cycles-pp.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close
45.67 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close
45.61 perf-profile.calltrace.cycles-pp.__alloc_fd.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_open
45.13 perf-profile.calltrace.cycles-pp._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
45.22 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64
45.17 perf-profile.calltrace.cycles-pp._raw_spin_lock.__alloc_fd.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
45.26 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__alloc_fd.do_sys_open.do_syscall_64
11.06 ± 2% perf-stat.i.MPKI
-1.7% 8.421e+09 perf-stat.i.branch-instructions
1.46 ± 2% perf-stat.i.branch-miss-rate%
-9.7% 1.225e+08 perf-stat.i.branch-misses
18.02 perf-stat.i.cache-miss-rate%
-6.2% 69544118 perf-stat.i.cache-misses
-11.8% 3.862e+08 perf-stat.i.cache-references
6416 perf-stat.i.context-switches
+1.6% 12.21 perf-stat.i.cpi
6134 perf-stat.i.cycles-between-cache-misses
0.29 ± 2% perf-stat.i.iTLB-load-miss-rate%
1e+08 ± 2% perf-stat.i.iTLB-load-misses
-2.0% 3.484e+10 perf-stat.i.iTLB-loads
-2.0% 3.492e+10 perf-stat.i.instructions
+16.2% 349.51 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.08 perf-stat.i.ipc
11.06 ± 2% perf-stat.overall.MPKI
1.45 ± 2% perf-stat.overall.branch-miss-rate%
18.01 perf-stat.overall.cache-miss-rate%
+1.6% 12.21 perf-stat.overall.cpi
6132 perf-stat.overall.cycles-between-cache-misses
0.29 ± 2% perf-stat.overall.iTLB-load-miss-rate%
+16.2% 349.34 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.08 perf-stat.overall.ipc
+11.7% 44150058 perf-stat.overall.path-length
-1.7% 8.388e+09 perf-stat.ps.branch-instructions
-9.7% 1.22e+08 perf-stat.ps.branch-misses
-6.2% 69273046 perf-stat.ps.cache-misses
-11.8% 3.847e+08 perf-stat.ps.cache-references
6390 perf-stat.ps.context-switches
-15.6% 99644821 ± 2% perf-stat.ps.iTLB-load-misses
-2.0% 3.47e+10 perf-stat.ps.iTLB-loads
-2.0% 3.479e+10 perf-stat.ps.instructions
-1.9% 1.047e+13 perf-stat.total.instructions
-43.0% 339338 ± 28% sched_debug.cfs_rq:/.MIN_vruntime.avg
-31.2% 3327465 ± 24% sched_debug.cfs_rq:/.MIN_vruntime.stddev
11329 ± 22% sched_debug.cfs_rq:/.load.avg
-24.1% 832275 ± 27% sched_debug.cfs_rq:/.load.max
78378 ± 26% sched_debug.cfs_rq:/.load.stddev
10.96 sched_debug.cfs_rq:/.load_avg.avg
-32.1% 354.80 ± 23% sched_debug.cfs_rq:/.load_avg.max
25.13 ± 21% sched_debug.cfs_rq:/.load_avg.stddev
-43.0% 339338 ± 28% sched_debug.cfs_rq:/.max_vruntime.avg
-31.2% 3327465 ± 24% sched_debug.cfs_rq:/.max_vruntime.stddev
0.10 ± 18% sched_debug.cfs_rq:/.nr_running.stddev
+37.6% 107.25 ± 19% sched_debug.cfs_rq:/.nr_spread_over.max
8.55 ± 15% sched_debug.cfs_rq:/.nr_spread_over.stddev
3.73 ± 8% sched_debug.cfs_rq:/.runnable_load_avg.avg
-35.1% 303.60 ± 30% sched_debug.cfs_rq:/.runnable_load_avg.max
18.54 ± 29% sched_debug.cfs_rq:/.runnable_load_avg.stddev
11323 ± 22% sched_debug.cfs_rq:/.runnable_weight.avg
-24.1% 832050 ± 27% sched_debug.cfs_rq:/.runnable_weight.max
78365 ± 26% sched_debug.cfs_rq:/.runnable_weight.stddev
-28.1% 555.30 ± 5% sched_debug.cpu.clock.stddev
-28.1% 555.30 ± 5% sched_debug.cpu.clock_task.stddev
3.76 ± 8% sched_debug.cpu.cpu_load[0].avg
18.88 ± 27% sched_debug.cpu.cpu_load[0].stddev
3.76 ± 8% sched_debug.cpu.cpu_load[1].avg
18.66 ± 28% sched_debug.cpu.cpu_load[1].stddev
3.75 ± 8% sched_debug.cpu.cpu_load[2].avg
18.53 ± 28% sched_debug.cpu.cpu_load[2].stddev
3.73 ± 8% sched_debug.cpu.cpu_load[3].avg
18.43 ± 28% sched_debug.cpu.cpu_load[3].stddev
3.72 ± 8% sched_debug.cpu.cpu_load[4].avg
-34.2% 300.10 ± 29% sched_debug.cpu.cpu_load[4].max
18.41 ± 28% sched_debug.cpu.cpu_load[4].stddev
12420 ± 19% sched_debug.cpu.load.avg
85804 ± 18% sched_debug.cpu.load.stddev
0.00 ± 3% sched_debug.cpu.next_balance.stddev
0.23 ± 79% sched_debug.rt_rq:/.rt_time.avg
65.01 ± 79% sched_debug.rt_rq:/.rt_time.max
3.82 ± 79% sched_debug.rt_rq:/.rt_time.stddev

will-it-scale.per_thread_ops

--------------------------------------------------+
|
.+.. .+ + |
.+..+.. .+. + + +.. .. |
+ + + + |
+ |
|
O O |
O O O |
O O O O O O O O |
O O O O O O O
|
|
|
--------------------------------------------------+

*************************************************************************
Knights Mill with 80G memory
===============================================================
kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-knm01/signal1/will-it-scale
fix blank line coding style warning")
core: move device->knode_class to device_private")
570d0200123fb4f809aa2f6226
--------------------------
fail:runs
|
1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
%stddev
\
-29.9% 187.00 will-it-scale.per_thread_ops
92.91 ± 3% will-it-scale.time.user_time
53963 will-it-scale.workload
+17.8% 928405 ± 5% cpuidle.C6.usage
-8.2% 30514 ± 2% numa-meminfo.node1.Slab
+14.5% 39908744 softirqs.TIMER
0.01 ± 67% mpstat.cpu.soft%
0.23 ± 2% mpstat.cpu.usr%
-12.8% 855.00 ± 2% slabinfo.skbuff_ext_cache.active_objs
-12.8% 855.00 ± 2% slabinfo.skbuff_ext_cache.num_objs
+21.5% 868290 ± 5% turbostat.C6
2.39 turbostat.RAMWatt
-23.8% 854.06 ± 4% sched_debug.cfs_rq:/.exec_clock.stddev
29.68 ± 17% sched_debug.cfs_rq:/.nr_spread_over.avg
63.93 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev
+94.9% 680874 ± 2% sched_debug.cpu.avg_idle.min
-37.1% 540.88 ± 7% sched_debug.cpu.clock.stddev
-37.1% 540.88 ± 7% sched_debug.cpu.clock_task.stddev
6517 sched_debug.cpu.curr->pid.max
0.00 ± 6% sched_debug.cpu.next_balance.stddev
16.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.signal_setup_done.do_signal.exit_to_usermode_loop
15.99 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.signal_setup_done.do_signal
32.04 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask
16.01 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.get_signal.do_signal.exit_to_usermode_loop
16.01 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64
32.05 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64
16.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64
16.01 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.54 perf-profile.calltrace.cycles-pp.handler
16.40 perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64
16.48 perf-profile.calltrace.cycles-pp.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
16.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.handler
16.49 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
16.49 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
16.44 perf-profile.calltrace.cycles-pp.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.56 perf-profile.calltrace.cycles-pp.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
16.58 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
16.46 perf-profile.calltrace.cycles-pp.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.71 ± 2% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask
0.77 ± 2% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64
0.59 ± 2% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.__set_current_blocked.sigprocmask
4.15 perf-stat.i.MPKI
0.47 perf-stat.i.branch-miss-rate%
-8.9% 42963996 perf-stat.i.branch-misses
-4.9% 33832492 perf-stat.i.cache-misses
-7.7% 1.533e+08 perf-stat.i.cache-references
-1.2% 12.24 perf-stat.i.cpi
+5.1% 13374 perf-stat.i.cycles-between-cache-misses
0.07 perf-stat.i.iTLB-load-miss-rate%
-14.8% 27651795 perf-stat.i.iTLB-load-misses
1339 perf-stat.i.instructions-per-iTLB-miss
4.15 perf-stat.overall.MPKI
0.47 ± 2% perf-stat.overall.branch-miss-rate%
-1.2% 12.24 perf-stat.overall.cpi
+5.1% 13367 perf-stat.overall.cycles-between-cache-misses
0.07 perf-stat.overall.iTLB-load-miss-rate%
1336 perf-stat.overall.instructions-per-iTLB-miss
+43.9% 2.035e+08 ± 2% perf-stat.overall.path-length
-9.3% 42262634 perf-stat.ps.branch-misses
-5.0% 33400721 perf-stat.ps.cache-misses
-7.8% 1.513e+08 perf-stat.ps.cache-references
-15.0% 27304782 perf-stat.ps.iTLB-load-misses
*************************************************************************
Knights Mill with 80G memory
===============================================================
kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-knm01/open1/will-it-scale
fix blank line coding style warning")
core: move device->knode_class to device_private")
570d0200123fb4f809aa2f6226
--------------------------
%stddev
\
-16.5% 788.75 will-it-scale.per_thread_ops
-21.3% 447754 will-it-scale.time.involuntary_context_switches
-7.2% 281.60 ± 2% will-it-scale.time.user_time
-16.6% 227268 will-it-scale.workload
0.05 ± 5% mpstat.cpu.soft%
-11.9% 169161 ± 3% slabinfo.filp.active_objs
-12.1% 4431600 ± 2% softirqs.RCU
2.58 turbostat.RAMWatt
3732 vmstat.system.cs
5532 proc-vmstat.nr_inactive_anon
5532 proc-vmstat.nr_zone_inactive_anon
6.50 ± 4% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.native_queued_spin_lock_slowpath._raw_spin_lock
5.40 ± 5% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.native_queued_spin_lock_slowpath
1.07 ± 2% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_open
1.02 ± 2% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
49.98 perf-profile.calltrace.cycles-pp.__GI___libc_open
47.86 perf-profile.calltrace.cycles-pp._raw_spin_lock.__alloc_fd.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
48.12 perf-profile.calltrace.cycles-pp._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
-46.2% 165114 ± 27% sched_debug.cfs_rq:/.MIN_vruntime.avg
-23.6% 28986155 ± 20% sched_debug.cfs_rq:/.MIN_vruntime.max
-33.9% 2114259 ± 22% sched_debug.cfs_rq:/.MIN_vruntime.stddev
1135 ± 9% sched_debug.cfs_rq:/.exec_clock.stddev
7650 ± 10% sched_debug.cfs_rq:/.load.avg
59825 ± 7% sched_debug.cfs_rq:/.load.stddev
-46.2% 165114 ± 27% sched_debug.cfs_rq:/.max_vruntime.avg
-23.6% 28986155 ± 20% sched_debug.cfs_rq:/.max_vruntime.max
-33.9% 2114259 ± 22% sched_debug.cfs_rq:/.max_vruntime.stddev
7642 ± 10% sched_debug.cfs_rq:/.runnable_weight.avg
59800 ± 7% sched_debug.cfs_rq:/.runnable_weight.stddev
548907 ± 71% sched_debug.cfs_rq:/.spread0.avg
-26.5% 821.87 ± 5% sched_debug.cpu.clock.stddev
-26.5% 821.87 ± 5% sched_debug.cpu.clock_task.stddev
0.00 ± 6% sched_debug.cpu.next_balance.stddev
1998 sched_debug.cpu.nr_switches.avg
1057 sched_debug.cpu.nr_switches.min
1891 ± 2% sched_debug.cpu.sched_count.avg
-17.0% 978.15 sched_debug.cpu.sched_count.min
0.00 sched_debug.cpu.sched_goidle.min
-18.1% 747.65 sched_debug.cpu.ttwu_count.avg
-21.5% 463.60 ± 2% sched_debug.cpu.ttwu_count.min
-19.8% 670.09 sched_debug.cpu.ttwu_local.avg
-22.5% 435.70 ± 3% sched_debug.cpu.ttwu_local.min
5.93 ± 3% perf-stat.i.MPKI
+1.8% 8.282e+09 perf-stat.i.branch-instructions
0.77 ± 2% perf-stat.i.branch-miss-rate%
-7.5% 63892585 ± 2% perf-stat.i.branch-misses
-6.8% 41391448 ± 3% perf-stat.i.cache-misses
-6.9% 2.003e+08 ± 3% perf-stat.i.cache-references
3436 perf-stat.i.context-switches
-1.8% 12.82 perf-stat.i.cpi
+7.2% 10490 ± 3% perf-stat.i.cycles-between-cache-misses
0.18 ± 6% perf-stat.i.iTLB-load-miss-rate%
-11.5% 59865979 ± 6% perf-stat.i.iTLB-load-misses
+1.4% 3.371e+10 perf-stat.i.iTLB-loads
+1.5% 3.379e+10 perf-stat.i.instructions
+15.2% 566.96 ± 6% perf-stat.i.instructions-per-iTLB-miss
0.08 perf-stat.i.ipc
5.93 ± 3% perf-stat.overall.MPKI
0.77 ± 2% perf-stat.overall.branch-miss-rate%
-1.8% 12.82 perf-stat.overall.cpi
+7.1% 10480 ± 3% perf-stat.overall.cycles-between-cache-misses
0.18 ± 6% perf-stat.overall.iTLB-load-miss-rate%
+15.2% 566.73 ± 6% perf-stat.overall.instructions-per-iTLB-miss
0.08 perf-stat.overall.ipc
+21.7% 44597360 perf-stat.overall.path-length
+1.8% 8.249e+09 perf-stat.ps.branch-instructions
-7.5% 63646780 ± 2% perf-stat.ps.branch-misses
-6.8% 41229829 ± 3% perf-stat.ps.cache-misses
-6.9% 1.995e+08 ± 3% perf-stat.ps.cache-references
3423 perf-stat.ps.context-switches
-11.5% 59634016 ± 6% perf-stat.ps.iTLB-load-misses
+1.5% 3.365e+10 perf-stat.ps.instructions
+1.5% 1.013e+13 perf-stat.total.instructions
estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.

Attachments:

(No filename) (29.57 kB)
config-5.0.0-rc2-00005-g570d020 (171.86 kB)
job.yaml (4.76 kB)
reproduce (321.00 B)
Download all attachments

2019-02-19 12:20:15

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> >Greeting,
> >
> >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
> >
> >
> >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
> >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
>
> This is interesting.
>
> I didn't expect the move of this field will impact the performance.
>
> The reason is struct device is a hotter memory than device->device_private?
>
> >in testcase: will-it-scale
> >on test machine: 288 threads Knights Mill with 80G memory
> >with following parameters:
> >
> > nr_task: 100%
> > mode: thread
> > test: unlink2
> > cpufreq_governor: performance
> >
> >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> >test-url: https://github.com/antonblanchard/will-it-scale
> >
> >In addition to that, the commit also has significant impact on the following tests:
> >
> >+------------------+---------------------------------------------------------------+
> >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
> >| test machine | 288 threads Knights Mill with 80G memory |
> >| test parameters | cpufreq_governor=performance |
> >| | mode=thread |
> >| | nr_task=100% |
> >| | test=signal1 |

Ok, I'm going to blame your testing system, or something here, and not
the above patch.

All this test does is call raise(3). That does not touch the driver
core at all.

> >+------------------+---------------------------------------------------------------+
> >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
> >| test machine | 288 threads Knights Mill with 80G memory |
> >| test parameters | cpufreq_governor=performance |
> >| | mode=thread |
> >| | nr_task=100% |
> >| | test=open1 |
> >+------------------+---------------------------------------------------------------+

Same here, open1 just calls open/close a lot. No driver core
interaction at all there either.

So are you _sure_ this is the offending patch?

thanks,

greg k-h

2019-02-21 03:11:39

by Chen, Rong A

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> > >Greeting,
> > >
> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
> > >
> > >
> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> >
> > This is interesting.
> >
> > I didn't expect the move of this field will impact the performance.
> >
> > The reason is struct device is a hotter memory than device->device_private?
> >
> > >in testcase: will-it-scale
> > >on test machine: 288 threads Knights Mill with 80G memory
> > >with following parameters:
> > >
> > > nr_task: 100%
> > > mode: thread
> > > test: unlink2
> > > cpufreq_governor: performance
> > >
> > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> > >test-url: https://github.com/antonblanchard/will-it-scale
> > >
> > >In addition to that, the commit also has significant impact on the following tests:
> > >
> > >+------------------+---------------------------------------------------------------+
> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
> > >| test machine | 288 threads Knights Mill with 80G memory |
> > >| test parameters | cpufreq_governor=performance |
> > >| | mode=thread |
> > >| | nr_task=100% |
> > >| | test=signal1 |
>
> Ok, I'm going to blame your testing system, or something here, and not
> the above patch.
>
> All this test does is call raise(3). That does not touch the driver
> core at all.
>
> > >+------------------+---------------------------------------------------------------+
> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
> > >| test machine | 288 threads Knights Mill with 80G memory |
> > >| test parameters | cpufreq_governor=performance |
> > >| | mode=thread |
> > >| | nr_task=100% |
> > >| | test=open1 |
> > >+------------------+---------------------------------------------------------------+
>
> Same here, open1 just calls open/close a lot. No driver core
> interaction at all there either.
>
> So are you _sure_ this is the offending patch?

Hi Greg,

We did an experiment, recovered the layout of struct device. and we
found the regression is gone. I guess the regession is not from the
patch but related to the struct layout.

tests: 1
testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01

570d0200123fb4f8 a36dc70b810afe9183de2ea18f
---------------- --------------------------
%stddev change %stddev
\ | \
237096 14% 270789 will-it-scale.workload
823 14% 939 will-it-scale.per_thread_ops

tests: 1
testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01

570d0200123fb4f8 a36dc70b810afe9183de2ea18f
---------------- --------------------------
%stddev change %stddev
\ | \
93.51 ± 3% 48% 138.53 ± 3% will-it-scale.time.user_time
186 40% 261 will-it-scale.per_thread_ops
53909 40% 75507 will-it-scale.workload

tests: 1
testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01

570d0200123fb4f8 a36dc70b810afe9183de2ea18f
---------------- --------------------------
%stddev change %stddev
\ | \
447722 22% 546258 ± 10% will-it-scale.time.involuntary_context_switches
226995 19% 269751 will-it-scale.workload
787 19% 936 will-it-scale.per_thread_ops

commit a36dc70b810afe9183de2ea18faa4c0939c139ac
Author: 0day robot <[email protected]>
Date: Wed Feb 20 14:21:19 2019 +0800

backfile klist_node in struct device for debugging

Signed-off-by: 0day robot <[email protected]>

diff --git a/include/linux/device.h b/include/linux/device.h
index d0e452fd0bff2..31666cb72b3ba 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1035,6 +1035,7 @@ struct device {
spinlock_t devres_lock;
struct list_head devres_head;

+ struct klist_node knode_class_test_by_rongc;
struct class *class;
const struct attribute_group **groups; /* optional groups */

Best Regards,
Rong Chen

2019-02-21 03:48:32

by Wei Yang

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>> > >Greeting,
>> > >
>> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>> > >
>> > >
>> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> > >
>> >
>> > This is interesting.
>> >
>> > I didn't expect the move of this field will impact the performance.
>> >
>> > The reason is struct device is a hotter memory than device->device_private?
>> >
>> > >in testcase: will-it-scale
>> > >on test machine: 288 threads Knights Mill with 80G memory
>> > >with following parameters:
>> > >
>> > > nr_task: 100%
>> > > mode: thread
>> > > test: unlink2
>> > > cpufreq_governor: performance
>> > >
>> > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>> > >test-url: https://github.com/antonblanchard/will-it-scale
>> > >
>> > >In addition to that, the commit also has significant impact on the following tests:
>> > >
>> > >+------------------+---------------------------------------------------------------+
>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>> > >| test machine | 288 threads Knights Mill with 80G memory |
>> > >| test parameters | cpufreq_governor=performance |
>> > >| | mode=thread |
>> > >| | nr_task=100% |
>> > >| | test=signal1 |
>>
>> Ok, I'm going to blame your testing system, or something here, and not
>> the above patch.
>>
>> All this test does is call raise(3). That does not touch the driver
>> core at all.
>>
>> > >+------------------+---------------------------------------------------------------+
>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>> > >| test machine | 288 threads Knights Mill with 80G memory |
>> > >| test parameters | cpufreq_governor=performance |
>> > >| | mode=thread |
>> > >| | nr_task=100% |
>> > >| | test=open1 |
>> > >+------------------+---------------------------------------------------------------+
>>
>> Same here, open1 just calls open/close a lot. No driver core
>> interaction at all there either.
>>
>> So are you _sure_ this is the offending patch?
>
>Hi Greg,
>
>We did an experiment, recovered the layout of struct device. and we
>found the regression is gone. I guess the regession is not from the
>patch but related to the struct layout.
>
>
>tests: 1
>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>
>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>---------------- --------------------------
> %stddev change %stddev
> \ | \
> 237096 14% 270789 will-it-scale.workload
> 823 14% 939 will-it-scale.per_thread_ops
>

Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
before 570d020012?

>
>tests: 1
>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>
>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>---------------- --------------------------
> %stddev change %stddev
> \ | \
> 93.51 ? 3% 48% 138.53 ? 3% will-it-scale.time.user_time
> 186 40% 261 will-it-scale.per_thread_ops
> 53909 40% 75507 will-it-scale.workload
>
>
>tests: 1
>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>
>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>---------------- --------------------------
> %stddev change %stddev
> \ | \
> 447722 22% 546258 ? 10% will-it-scale.time.involuntary_context_switches
> 226995 19% 269751 will-it-scale.workload
> 787 19% 936 will-it-scale.per_thread_ops
>
>
>
>commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>Author: 0day robot <[email protected]>
>Date: Wed Feb 20 14:21:19 2019 +0800
>
> backfile klist_node in struct device for debugging
>
> Signed-off-by: 0day robot <[email protected]>
>
>diff --git a/include/linux/device.h b/include/linux/device.h
>index d0e452fd0bff2..31666cb72b3ba 100644
>--- a/include/linux/device.h
>+++ b/include/linux/device.h
>@@ -1035,6 +1035,7 @@ struct device {
> spinlock_t devres_lock;
> struct list_head devres_head;
>
>+ struct klist_node knode_class_test_by_rongc;
> struct class *class;
> const struct attribute_group **groups; /* optional groups */

Hmm... because this is not properly aligned?

struct klist_node {
void *n_klist; /* never access directly */
struct list_head n_node;
struct kref n_ref;
};

Except struct kref has one "int" type, others are pointers.

But... I am still confused.

>
>Best Regards,
>Rong Chen

--
Wei Yang
Help you, Help me

2019-02-21 04:47:11

by Huang, Ying

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

Wei Yang <[email protected]> writes:

> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>> > >Greeting,
>>> > >
>>> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>>> > >
>>> > >
>>> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>>> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>> > >
>>> >
>>> > This is interesting.
>>> >
>>> > I didn't expect the move of this field will impact the performance.
>>> >
>>> > The reason is struct device is a hotter memory than device->device_private?
>>> >
>>> > >in testcase: will-it-scale
>>> > >on test machine: 288 threads Knights Mill with 80G memory
>>> > >with following parameters:
>>> > >
>>> > > nr_task: 100%
>>> > > mode: thread
>>> > > test: unlink2
>>> > > cpufreq_governor: performance
>>> > >
>>> > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>> > >test-url: https://github.com/antonblanchard/will-it-scale
>>> > >
>>> > >In addition to that, the commit also has significant impact on the following tests:
>>> > >
>>> > >+------------------+---------------------------------------------------------------+
>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>>> > >| test machine | 288 threads Knights Mill with 80G memory |
>>> > >| test parameters | cpufreq_governor=performance |
>>> > >| | mode=thread |
>>> > >| | nr_task=100% |
>>> > >| | test=signal1 |
>>>
>>> Ok, I'm going to blame your testing system, or something here, and not
>>> the above patch.
>>>
>>> All this test does is call raise(3). That does not touch the driver
>>> core at all.
>>>
>>> > >+------------------+---------------------------------------------------------------+
>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>>> > >| test machine | 288 threads Knights Mill with 80G memory |
>>> > >| test parameters | cpufreq_governor=performance |
>>> > >| | mode=thread |
>>> > >| | nr_task=100% |
>>> > >| | test=open1 |
>>> > >+------------------+---------------------------------------------------------------+
>>>
>>> Same here, open1 just calls open/close a lot. No driver core
>>> interaction at all there either.
>>>
>>> So are you _sure_ this is the offending patch?
>>
>>Hi Greg,
>>
>>We did an experiment, recovered the layout of struct device. and we
>>found the regression is gone. I guess the regession is not from the
>>patch but related to the struct layout.
>>
>>
>>tests: 1
>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>
>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>---------------- --------------------------
>> %stddev change %stddev
>> \ | \
>> 237096 14% 270789 will-it-scale.workload
>> 823 14% 939 will-it-scale.per_thread_ops
>>
>
> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
> before 570d020012?
>
>>
>>tests: 1
>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>
>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>---------------- --------------------------
>> %stddev change %stddev
>> \ | \
>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>> 186 40% 261 will-it-scale.per_thread_ops
>> 53909 40% 75507 will-it-scale.workload
>>
>>
>>tests: 1
>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>>
>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>---------------- --------------------------
>> %stddev change %stddev
>> \ | \
>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>> 226995 19% 269751 will-it-scale.workload
>> 787 19% 936 will-it-scale.per_thread_ops
>>
>>
>>
>>commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>>Author: 0day robot <[email protected]>
>>Date: Wed Feb 20 14:21:19 2019 +0800
>>
>> backfile klist_node in struct device for debugging
>>
>> Signed-off-by: 0day robot <[email protected]>
>>
>>diff --git a/include/linux/device.h b/include/linux/device.h
>>index d0e452fd0bff2..31666cb72b3ba 100644
>>--- a/include/linux/device.h
>>+++ b/include/linux/device.h
>>@@ -1035,6 +1035,7 @@ struct device {
>> spinlock_t devres_lock;
>> struct list_head devres_head;
>>
>>+ struct klist_node knode_class_test_by_rongc;
>> struct class *class;
>> const struct attribute_group **groups; /* optional groups */
>
> Hmm... because this is not properly aligned?
>
> struct klist_node {
> void *n_klist; /* never access directly */
> struct list_head n_node;
> struct kref n_ref;
> };
>
> Except struct kref has one "int" type, others are pointers.
>
> But... I am still confused.

I guess because the size of struct device is changed, it influences some
alignment changes in the system. Thus influence the benchmark score.

Best Regards,
Huang, Ying

>>
>>Best Regards,
>>Rong Chen

2019-02-21 05:47:21

by Chen, Rong A

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 11:46:12AM +0800, Wei Yang wrote:
> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
> >On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> >> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> >> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> >> > >Greeting,
> >> > >
> >> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
> >> > >
> >> > >
> >> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
> >> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >> > >
> >> >
> >> > This is interesting.
> >> >
> >> > I didn't expect the move of this field will impact the performance.
> >> >
> >> > The reason is struct device is a hotter memory than device->device_private?
> >> >
> >> > >in testcase: will-it-scale
> >> > >on test machine: 288 threads Knights Mill with 80G memory
> >> > >with following parameters:
> >> > >
> >> > > nr_task: 100%
> >> > > mode: thread
> >> > > test: unlink2
> >> > > cpufreq_governor: performance
> >> > >
> >> > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> >> > >test-url: https://github.com/antonblanchard/will-it-scale
> >> > >
> >> > >In addition to that, the commit also has significant impact on the following tests:
> >> > >
> >> > >+------------------+---------------------------------------------------------------+
> >> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
> >> > >| test machine | 288 threads Knights Mill with 80G memory |
> >> > >| test parameters | cpufreq_governor=performance |
> >> > >| | mode=thread |
> >> > >| | nr_task=100% |
> >> > >| | test=signal1 |
> >>
> >> Ok, I'm going to blame your testing system, or something here, and not
> >> the above patch.
> >>
> >> All this test does is call raise(3). That does not touch the driver
> >> core at all.
> >>
> >> > >+------------------+---------------------------------------------------------------+
> >> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
> >> > >| test machine | 288 threads Knights Mill with 80G memory |
> >> > >| test parameters | cpufreq_governor=performance |
> >> > >| | mode=thread |
> >> > >| | nr_task=100% |
> >> > >| | test=open1 |
> >> > >+------------------+---------------------------------------------------------------+
> >>
> >> Same here, open1 just calls open/close a lot. No driver core
> >> interaction at all there either.
> >>
> >> So are you _sure_ this is the offending patch?
> >
> >Hi Greg,
> >
> >We did an experiment, recovered the layout of struct device. and we
> >found the regression is gone. I guess the regession is not from the
> >patch but related to the struct layout.
> >
> >
> >tests: 1
> >testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
> >
> >570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> >---------------- --------------------------
> > %stddev change %stddev
> > \ | \
> > 237096 14% 270789 will-it-scale.workload
> > 823 14% 939 will-it-scale.per_thread_ops
> >
>
> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
> before 570d020012?

testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01

4bd4e92cfe6d2af7 a36dc70b810afe9183de2ea18f
---------------- --------------------------
%stddev %change %stddev
\ | \
937.00 +0.2% 939.33 will-it-scale.per_thread_ops
269989 +0.3% 270789 will-it-scale.workload

> >
> >tests: 1
> >testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
> >
> >570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> >---------------- --------------------------
> > %stddev change %stddev
> > \ | \
> > 93.51 ± 3% 48% 138.53 ± 3% will-it-scale.time.user_time
> > 186 40% 261 will-it-scale.per_thread_ops
> > 53909 40% 75507 will-it-scale.workload
> >

testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01

4bd4e92cfe6d2af7 a36dc70b810afe9183de2ea18f
---------------- --------------------------
%stddev %change %stddev
\ | \
266.00 ± 2% -1.6% 261.67 will-it-scale.per_thread_ops
76699 ± 2% -1.6% 75507 will-it-scale.workload

> >
> >tests: 1
> >testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
> >
> >570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> >---------------- --------------------------
> > %stddev change %stddev
> > \ | \
> > 447722 22% 546258 ± 10% will-it-scale.time.involuntary_context_switches
> > 226995 19% 269751 will-it-scale.workload
> > 787 19% 936 will-it-scale.per_thread_ops
> >
> >

testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01

4bd4e92cfe6d2af7 a36dc70b810afe9183de2ea18f
---------------- --------------------------
%stddev %change %stddev
\ | \
944.60 -0.9% 936.00 will-it-scale.per_thread_ops
272252 -0.9% 269751 will-it-scale.workload

Best Regards,
Rong Chen

2019-02-21 06:03:25

by Wei Yang

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 12:46:18PM +0800, Huang, Ying wrote:
>Wei Yang <[email protected]> writes:
>
>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>>> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>>> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>>> > >Greeting,
>>>> > >
>>>> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>>>> > >
>>>> > >
>>>> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>>>> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>> > >
>>>> >
>>>> > This is interesting.
>>>> >
>>>> > I didn't expect the move of this field will impact the performance.
>>>> >
>>>> > The reason is struct device is a hotter memory than device->device_private?
>>>> >
>>>> > >in testcase: will-it-scale
>>>> > >on test machine: 288 threads Knights Mill with 80G memory
>>>> > >with following parameters:
>>>> > >
>>>> > > nr_task: 100%
>>>> > > mode: thread
>>>> > > test: unlink2
>>>> > > cpufreq_governor: performance
>>>> > >
>>>> > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>>> > >test-url: https://github.com/antonblanchard/will-it-scale
>>>> > >
>>>> > >In addition to that, the commit also has significant impact on the following tests:
>>>> > >
>>>> > >+------------------+---------------------------------------------------------------+
>>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>>>> > >| test machine | 288 threads Knights Mill with 80G memory |
>>>> > >| test parameters | cpufreq_governor=performance |
>>>> > >| | mode=thread |
>>>> > >| | nr_task=100% |
>>>> > >| | test=signal1 |
>>>>
>>>> Ok, I'm going to blame your testing system, or something here, and not
>>>> the above patch.
>>>>
>>>> All this test does is call raise(3). That does not touch the driver
>>>> core at all.
>>>>
>>>> > >+------------------+---------------------------------------------------------------+
>>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>>>> > >| test machine | 288 threads Knights Mill with 80G memory |
>>>> > >| test parameters | cpufreq_governor=performance |
>>>> > >| | mode=thread |
>>>> > >| | nr_task=100% |
>>>> > >| | test=open1 |
>>>> > >+------------------+---------------------------------------------------------------+
>>>>
>>>> Same here, open1 just calls open/close a lot. No driver core
>>>> interaction at all there either.
>>>>
>>>> So are you _sure_ this is the offending patch?
>>>
>>>Hi Greg,
>>>
>>>We did an experiment, recovered the layout of struct device. and we
>>>found the regression is gone. I guess the regession is not from the
>>>patch but related to the struct layout.
>>>
>>>
>>>tests: 1
>>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>>
>>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>---------------- --------------------------
>>> %stddev change %stddev
>>> \ | \
>>> 237096 14% 270789 will-it-scale.workload
>>> 823 14% 939 will-it-scale.per_thread_ops
>>>
>>
>> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
>> before 570d020012?
>>
>>>
>>>tests: 1
>>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>>
>>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>---------------- --------------------------
>>> %stddev change %stddev
>>> \ | \
>>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>>> 186 40% 261 will-it-scale.per_thread_ops
>>> 53909 40% 75507 will-it-scale.workload
>>>
>>>
>>>tests: 1
>>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>>>
>>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>---------------- --------------------------
>>> %stddev change %stddev
>>> \ | \
>>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>>> 226995 19% 269751 will-it-scale.workload
>>> 787 19% 936 will-it-scale.per_thread_ops
>>>
>>>
>>>
>>>commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>>>Author: 0day robot <[email protected]>
>>>Date: Wed Feb 20 14:21:19 2019 +0800
>>>
>>> backfile klist_node in struct device for debugging
>>>
>>> Signed-off-by: 0day robot <[email protected]>
>>>
>>>diff --git a/include/linux/device.h b/include/linux/device.h
>>>index d0e452fd0bff2..31666cb72b3ba 100644
>>>--- a/include/linux/device.h
>>>+++ b/include/linux/device.h
>>>@@ -1035,6 +1035,7 @@ struct device {
>>> spinlock_t devres_lock;
>>> struct list_head devres_head;
>>>
>>>+ struct klist_node knode_class_test_by_rongc;
>>> struct class *class;
>>> const struct attribute_group **groups; /* optional groups */
>>
>> Hmm... because this is not properly aligned?
>>
>> struct klist_node {
>> void *n_klist; /* never access directly */
>> struct list_head n_node;
>> struct kref n_ref;
>> };
>>
>> Except struct kref has one "int" type, others are pointers.
>>
>> But... I am still confused.
>
>I guess because the size of struct device is changed, it influences some
>alignment changes in the system. Thus influence the benchmark score.
>

That's interesting.

I wrote a module to see the exact size of these two structure on my x86_64.

sizeof(struct device) = 736 = 8 * 92
sizeof(struct device_private) = 160 = 8 * 20
sizeof(struct klist_node) = 32 = 8 * 4

Even klist_node has one 4 byte field, c complier would pack the structure to
make it aligned. Which system alignment it would affect?

After the patch, size would change like this:

struct device 736 -> 704
struce device_private 160 -> 192

Would this size change affect system?

>Best Regards,
>Huang, Ying
>
>>>
>>>Best Regards,
>>>Rong Chen

--
Wei Yang
Help you, Help me

2019-02-21 06:30:22

by Huang, Ying

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

Wei Yang <[email protected]> writes:

> On Thu, Feb 21, 2019 at 12:46:18PM +0800, Huang, Ying wrote:
>>Wei Yang <[email protected]> writes:
>>
>>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>>>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>>>> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>>>> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>>>> > >Greeting,
>>>>> > >
>>>>> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>>>>> > >
>>>>> > >
>>>>> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>>>>> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>> > >
>>>>> >
>>>>> > This is interesting.
>>>>> >
>>>>> > I didn't expect the move of this field will impact the performance.
>>>>> >
>>>>> > The reason is struct device is a hotter memory than device->device_private?
>>>>> >
>>>>> > >in testcase: will-it-scale
>>>>> > >on test machine: 288 threads Knights Mill with 80G memory
>>>>> > >with following parameters:
>>>>> > >
>>>>> > > nr_task: 100%
>>>>> > > mode: thread
>>>>> > > test: unlink2
>>>>> > > cpufreq_governor: performance
>>>>> > >
>>>>> > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>>>> > >test-url: https://github.com/antonblanchard/will-it-scale
>>>>> > >
>>>>> > >In addition to that, the commit also has significant impact on the following tests:
>>>>> > >
>>>>> > >+------------------+---------------------------------------------------------------+
>>>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>>>>> > >| test machine | 288 threads Knights Mill with 80G memory |
>>>>> > >| test parameters | cpufreq_governor=performance |
>>>>> > >| | mode=thread |
>>>>> > >| | nr_task=100% |
>>>>> > >| | test=signal1 |
>>>>>
>>>>> Ok, I'm going to blame your testing system, or something here, and not
>>>>> the above patch.
>>>>>
>>>>> All this test does is call raise(3). That does not touch the driver
>>>>> core at all.
>>>>>
>>>>> > >+------------------+---------------------------------------------------------------+
>>>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>>>>> > >| test machine | 288 threads Knights Mill with 80G memory |
>>>>> > >| test parameters | cpufreq_governor=performance |
>>>>> > >| | mode=thread |
>>>>> > >| | nr_task=100% |
>>>>> > >| | test=open1 |
>>>>> > >+------------------+---------------------------------------------------------------+
>>>>>
>>>>> Same here, open1 just calls open/close a lot. No driver core
>>>>> interaction at all there either.
>>>>>
>>>>> So are you _sure_ this is the offending patch?
>>>>
>>>>Hi Greg,
>>>>
>>>>We did an experiment, recovered the layout of struct device. and we
>>>>found the regression is gone. I guess the regession is not from the
>>>>patch but related to the struct layout.
>>>>
>>>>
>>>>tests: 1
>>>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>>>
>>>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>>---------------- --------------------------
>>>> %stddev change %stddev
>>>> \ | \
>>>> 237096 14% 270789 will-it-scale.workload
>>>> 823 14% 939 will-it-scale.per_thread_ops
>>>>
>>>
>>> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
>>> before 570d020012?
>>>
>>>>
>>>>tests: 1
>>>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>>>
>>>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>>---------------- --------------------------
>>>> %stddev change %stddev
>>>> \ | \
>>>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>>>> 186 40% 261 will-it-scale.per_thread_ops
>>>> 53909 40% 75507 will-it-scale.workload
>>>>
>>>>
>>>>tests: 1
>>>>testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>>>>
>>>>570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>>---------------- --------------------------
>>>> %stddev change %stddev
>>>> \ | \
>>>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>>>> 226995 19% 269751 will-it-scale.workload
>>>> 787 19% 936 will-it-scale.per_thread_ops
>>>>
>>>>
>>>>
>>>>commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>>>>Author: 0day robot <[email protected]>
>>>>Date: Wed Feb 20 14:21:19 2019 +0800
>>>>
>>>> backfile klist_node in struct device for debugging
>>>>
>>>> Signed-off-by: 0day robot <[email protected]>
>>>>
>>>>diff --git a/include/linux/device.h b/include/linux/device.h
>>>>index d0e452fd0bff2..31666cb72b3ba 100644
>>>>--- a/include/linux/device.h
>>>>+++ b/include/linux/device.h
>>>>@@ -1035,6 +1035,7 @@ struct device {
>>>> spinlock_t devres_lock;
>>>> struct list_head devres_head;
>>>>
>>>>+ struct klist_node knode_class_test_by_rongc;
>>>> struct class *class;
>>>> const struct attribute_group **groups; /* optional groups */
>>>
>>> Hmm... because this is not properly aligned?
>>>
>>> struct klist_node {
>>> void *n_klist; /* never access directly */
>>> struct list_head n_node;
>>> struct kref n_ref;
>>> };
>>>
>>> Except struct kref has one "int" type, others are pointers.
>>>
>>> But... I am still confused.
>>
>>I guess because the size of struct device is changed, it influences some
>>alignment changes in the system. Thus influence the benchmark score.
>>
>
> That's interesting.
>
> I wrote a module to see the exact size of these two structure on my x86_64.
>
> sizeof(struct device) = 736 = 8 * 92
> sizeof(struct device_private) = 160 = 8 * 20
> sizeof(struct klist_node) = 32 = 8 * 4
>
> Even klist_node has one 4 byte field, c complier would pack the structure to
> make it aligned. Which system alignment it would affect?
>
> After the patch, size would change like this:
>
> struct device 736 -> 704
> struce device_private 160 -> 192
>
> Would this size change affect system?

Yes. I guess these size change may affect system performance. Some
other objects may share slab page with these objects.

Best Regards,
Huang, Ying

>>Best Regards,
>>Huang, Ying
>>
>>>>
>>>>Best Regards,
>>>>Rong Chen

2019-02-21 07:11:06

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> > > >Greeting,
> > > >
> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
> > > >
> > > >
> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > >
> > >
> > > This is interesting.
> > >
> > > I didn't expect the move of this field will impact the performance.
> > >
> > > The reason is struct device is a hotter memory than device->device_private?
> > >
> > > >in testcase: will-it-scale
> > > >on test machine: 288 threads Knights Mill with 80G memory
> > > >with following parameters:
> > > >
> > > > nr_task: 100%
> > > > mode: thread
> > > > test: unlink2
> > > > cpufreq_governor: performance
> > > >
> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> > > >test-url: https://github.com/antonblanchard/will-it-scale
> > > >
> > > >In addition to that, the commit also has significant impact on the following tests:
> > > >
> > > >+------------------+---------------------------------------------------------------+
> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
> > > >| test machine | 288 threads Knights Mill with 80G memory |
> > > >| test parameters | cpufreq_governor=performance |
> > > >| | mode=thread |
> > > >| | nr_task=100% |
> > > >| | test=signal1 |
> >
> > Ok, I'm going to blame your testing system, or something here, and not
> > the above patch.
> >
> > All this test does is call raise(3). That does not touch the driver
> > core at all.
> >
> > > >+------------------+---------------------------------------------------------------+
> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
> > > >| test machine | 288 threads Knights Mill with 80G memory |
> > > >| test parameters | cpufreq_governor=performance |
> > > >| | mode=thread |
> > > >| | nr_task=100% |
> > > >| | test=open1 |
> > > >+------------------+---------------------------------------------------------------+
> >
> > Same here, open1 just calls open/close a lot. No driver core
> > interaction at all there either.
> >
> > So are you _sure_ this is the offending patch?
>
> Hi Greg,
>
> We did an experiment, recovered the layout of struct device. and we
> found the regression is gone. I guess the regession is not from the
> patch but related to the struct layout.
>
>
> tests: 1
> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>
> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> ---------------- --------------------------
> %stddev change %stddev
> \ | \
> 237096 14% 270789 will-it-scale.workload
> 823 14% 939 will-it-scale.per_thread_ops
>
>
> tests: 1
> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>
> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> ---------------- --------------------------
> %stddev change %stddev
> \ | \
> 93.51 ? 3% 48% 138.53 ? 3% will-it-scale.time.user_time
> 186 40% 261 will-it-scale.per_thread_ops
> 53909 40% 75507 will-it-scale.workload
>
>
> tests: 1
> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>
> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> ---------------- --------------------------
> %stddev change %stddev
> \ | \
> 447722 22% 546258 ? 10% will-it-scale.time.involuntary_context_switches
> 226995 19% 269751 will-it-scale.workload
> 787 19% 936 will-it-scale.per_thread_ops
>
>
>
> commit a36dc70b810afe9183de2ea18faa4c0939c139ac
> Author: 0day robot <[email protected]>
> Date: Wed Feb 20 14:21:19 2019 +0800
>
> backfile klist_node in struct device for debugging
>
> Signed-off-by: 0day robot <[email protected]>
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index d0e452fd0bff2..31666cb72b3ba 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1035,6 +1035,7 @@ struct device {
> spinlock_t devres_lock;
> struct list_head devres_head;
>
> + struct klist_node knode_class_test_by_rongc;
> struct class *class;
> const struct attribute_group **groups; /* optional groups */

While this is fun to worry about alignment and structure size of 'struct
device' I find it odd given that the syscalls and userspace load of
those test programs have nothing to do with 'struct device' at all.

So I can work on fixing up the alignment of struct device, as that's a
nice thing to do for systems with 30k of these in memory, but that
shouldn't affect a workload of a constant string of signal calls.

thanks,

greg k-h

2019-02-21 07:19:07

by Huang, Ying

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

Greg Kroah-Hartman <[email protected]> writes:

> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>> > > >Greeting,
>> > > >
>> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>> > > >
>> > > >
>> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> > > >
>> > >
>> > > This is interesting.
>> > >
>> > > I didn't expect the move of this field will impact the performance.
>> > >
>> > > The reason is struct device is a hotter memory than device->device_private?
>> > >
>> > > >in testcase: will-it-scale
>> > > >on test machine: 288 threads Knights Mill with 80G memory
>> > > >with following parameters:
>> > > >
>> > > > nr_task: 100%
>> > > > mode: thread
>> > > > test: unlink2
>> > > > cpufreq_governor: performance
>> > > >
>> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>> > > >test-url: https://github.com/antonblanchard/will-it-scale
>> > > >
>> > > >In addition to that, the commit also has significant impact on the following tests:
>> > > >
>> > > >+------------------+---------------------------------------------------------------+
>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>> > > >| test machine | 288 threads Knights Mill with 80G memory |
>> > > >| test parameters | cpufreq_governor=performance |
>> > > >| | mode=thread |
>> > > >| | nr_task=100% |
>> > > >| | test=signal1 |
>> >
>> > Ok, I'm going to blame your testing system, or something here, and not
>> > the above patch.
>> >
>> > All this test does is call raise(3). That does not touch the driver
>> > core at all.
>> >
>> > > >+------------------+---------------------------------------------------------------+
>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>> > > >| test machine | 288 threads Knights Mill with 80G memory |
>> > > >| test parameters | cpufreq_governor=performance |
>> > > >| | mode=thread |
>> > > >| | nr_task=100% |
>> > > >| | test=open1 |
>> > > >+------------------+---------------------------------------------------------------+
>> >
>> > Same here, open1 just calls open/close a lot. No driver core
>> > interaction at all there either.
>> >
>> > So are you _sure_ this is the offending patch?
>>
>> Hi Greg,
>>
>> We did an experiment, recovered the layout of struct device. and we
>> found the regression is gone. I guess the regession is not from the
>> patch but related to the struct layout.
>>
>>
>> tests: 1
>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>
>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>> ---------------- --------------------------
>> %stddev change %stddev
>> \ | \
>> 237096 14% 270789 will-it-scale.workload
>> 823 14% 939 will-it-scale.per_thread_ops
>>
>>
>> tests: 1
>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>
>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>> ---------------- --------------------------
>> %stddev change %stddev
>> \ | \
>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>> 186 40% 261 will-it-scale.per_thread_ops
>> 53909 40% 75507 will-it-scale.workload
>>
>>
>> tests: 1
>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>>
>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>> ---------------- --------------------------
>> %stddev change %stddev
>> \ | \
>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>> 226995 19% 269751 will-it-scale.workload
>> 787 19% 936 will-it-scale.per_thread_ops
>>
>>
>>
>> commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>> Author: 0day robot <[email protected]>
>> Date: Wed Feb 20 14:21:19 2019 +0800
>>
>> backfile klist_node in struct device for debugging
>>
>> Signed-off-by: 0day robot <[email protected]>
>>
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index d0e452fd0bff2..31666cb72b3ba 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -1035,6 +1035,7 @@ struct device {
>> spinlock_t devres_lock;
>> struct list_head devres_head;
>>
>> + struct klist_node knode_class_test_by_rongc;
>> struct class *class;
>> const struct attribute_group **groups; /* optional groups */
>
> While this is fun to worry about alignment and structure size of 'struct
> device' I find it odd given that the syscalls and userspace load of
> those test programs have nothing to do with 'struct device' at all.
>
> So I can work on fixing up the alignment of struct device, as that's a
> nice thing to do for systems with 30k of these in memory, but that
> shouldn't affect a workload of a constant string of signal calls.

Hi, Greg,

I don't think this is an issues of struct device. As you said, struct
device isn't access much during test. Struct device may share slab page
with some other data structures (signal related, or fd related (as in
some other test cases)), so that the alignment of these data structures
are affected, so caused the performance regression.

Best Regards,
Huang, Ying

> thanks,
>
> greg k-h

2019-02-21 07:36:14

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote:
> Greg Kroah-Hartman <[email protected]> writes:
>
> > On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
> >> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> >> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> >> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> >> > > >Greeting,
> >> > > >
> >> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
> >> > > >
> >> > > >
> >> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
> >> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >> > > >
> >> > >
> >> > > This is interesting.
> >> > >
> >> > > I didn't expect the move of this field will impact the performance.
> >> > >
> >> > > The reason is struct device is a hotter memory than device->device_private?
> >> > >
> >> > > >in testcase: will-it-scale
> >> > > >on test machine: 288 threads Knights Mill with 80G memory
> >> > > >with following parameters:
> >> > > >
> >> > > > nr_task: 100%
> >> > > > mode: thread
> >> > > > test: unlink2
> >> > > > cpufreq_governor: performance
> >> > > >
> >> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> >> > > >test-url: https://github.com/antonblanchard/will-it-scale
> >> > > >
> >> > > >In addition to that, the commit also has significant impact on the following tests:
> >> > > >
> >> > > >+------------------+---------------------------------------------------------------+
> >> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
> >> > > >| test machine | 288 threads Knights Mill with 80G memory |
> >> > > >| test parameters | cpufreq_governor=performance |
> >> > > >| | mode=thread |
> >> > > >| | nr_task=100% |
> >> > > >| | test=signal1 |
> >> >
> >> > Ok, I'm going to blame your testing system, or something here, and not
> >> > the above patch.
> >> >
> >> > All this test does is call raise(3). That does not touch the driver
> >> > core at all.
> >> >
> >> > > >+------------------+---------------------------------------------------------------+
> >> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
> >> > > >| test machine | 288 threads Knights Mill with 80G memory |
> >> > > >| test parameters | cpufreq_governor=performance |
> >> > > >| | mode=thread |
> >> > > >| | nr_task=100% |
> >> > > >| | test=open1 |
> >> > > >+------------------+---------------------------------------------------------------+
> >> >
> >> > Same here, open1 just calls open/close a lot. No driver core
> >> > interaction at all there either.
> >> >
> >> > So are you _sure_ this is the offending patch?
> >>
> >> Hi Greg,
> >>
> >> We did an experiment, recovered the layout of struct device. and we
> >> found the regression is gone. I guess the regession is not from the
> >> patch but related to the struct layout.
> >>
> >>
> >> tests: 1
> >> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
> >>
> >> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> >> ---------------- --------------------------
> >> %stddev change %stddev
> >> \ | \
> >> 237096 14% 270789 will-it-scale.workload
> >> 823 14% 939 will-it-scale.per_thread_ops
> >>
> >>
> >> tests: 1
> >> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
> >>
> >> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> >> ---------------- --------------------------
> >> %stddev change %stddev
> >> \ | \
> >> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
> >> 186 40% 261 will-it-scale.per_thread_ops
> >> 53909 40% 75507 will-it-scale.workload
> >>
> >>
> >> tests: 1
> >> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
> >>
> >> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
> >> ---------------- --------------------------
> >> %stddev change %stddev
> >> \ | \
> >> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
> >> 226995 19% 269751 will-it-scale.workload
> >> 787 19% 936 will-it-scale.per_thread_ops
> >>
> >>
> >>
> >> commit a36dc70b810afe9183de2ea18faa4c0939c139ac
> >> Author: 0day robot <[email protected]>
> >> Date: Wed Feb 20 14:21:19 2019 +0800
> >>
> >> backfile klist_node in struct device for debugging
> >>
> >> Signed-off-by: 0day robot <[email protected]>
> >>
> >> diff --git a/include/linux/device.h b/include/linux/device.h
> >> index d0e452fd0bff2..31666cb72b3ba 100644
> >> --- a/include/linux/device.h
> >> +++ b/include/linux/device.h
> >> @@ -1035,6 +1035,7 @@ struct device {
> >> spinlock_t devres_lock;
> >> struct list_head devres_head;
> >>
> >> + struct klist_node knode_class_test_by_rongc;
> >> struct class *class;
> >> const struct attribute_group **groups; /* optional groups */
> >
> > While this is fun to worry about alignment and structure size of 'struct
> > device' I find it odd given that the syscalls and userspace load of
> > those test programs have nothing to do with 'struct device' at all.
> >
> > So I can work on fixing up the alignment of struct device, as that's a
> > nice thing to do for systems with 30k of these in memory, but that
> > shouldn't affect a workload of a constant string of signal calls.
>
> Hi, Greg,
>
> I don't think this is an issues of struct device. As you said, struct
> device isn't access much during test. Struct device may share slab page
> with some other data structures (signal related, or fd related (as in
> some other test cases)), so that the alignment of these data structures
> are affected, so caused the performance regression.

But allocation of a structure should always be "properly" aligned, no
matter what something else did in the system as that is what kmalloc
ensures. If not, then we have problems in our memory allocator :)

So something is odd here, but I don't think that is it...

greg k-h

2019-02-21 07:54:23

by Wei Yang

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote:
>Greg Kroah-Hartman <[email protected]> writes:
>
>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>> > > >Greeting,
>>> > > >
>>> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>>> > > >
>>> > > >
>>> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>>> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>> > > >
>>> > >
>>> > > This is interesting.
>>> > >
>>> > > I didn't expect the move of this field will impact the performance.
>>> > >
>>> > > The reason is struct device is a hotter memory than device->device_private?
>>> > >
>>> > > >in testcase: will-it-scale
>>> > > >on test machine: 288 threads Knights Mill with 80G memory
>>> > > >with following parameters:
>>> > > >
>>> > > > nr_task: 100%
>>> > > > mode: thread
>>> > > > test: unlink2
>>> > > > cpufreq_governor: performance
>>> > > >
>>> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>> > > >test-url: https://github.com/antonblanchard/will-it-scale
>>> > > >
>>> > > >In addition to that, the commit also has significant impact on the following tests:
>>> > > >
>>> > > >+------------------+---------------------------------------------------------------+
>>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>>> > > >| test machine | 288 threads Knights Mill with 80G memory |
>>> > > >| test parameters | cpufreq_governor=performance |
>>> > > >| | mode=thread |
>>> > > >| | nr_task=100% |
>>> > > >| | test=signal1 |
>>> >
>>> > Ok, I'm going to blame your testing system, or something here, and not
>>> > the above patch.
>>> >
>>> > All this test does is call raise(3). That does not touch the driver
>>> > core at all.
>>> >
>>> > > >+------------------+---------------------------------------------------------------+
>>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>>> > > >| test machine | 288 threads Knights Mill with 80G memory |
>>> > > >| test parameters | cpufreq_governor=performance |
>>> > > >| | mode=thread |
>>> > > >| | nr_task=100% |
>>> > > >| | test=open1 |
>>> > > >+------------------+---------------------------------------------------------------+
>>> >
>>> > Same here, open1 just calls open/close a lot. No driver core
>>> > interaction at all there either.
>>> >
>>> > So are you _sure_ this is the offending patch?
>>>
>>> Hi Greg,
>>>
>>> We did an experiment, recovered the layout of struct device. and we
>>> found the regression is gone. I guess the regession is not from the
>>> patch but related to the struct layout.
>>>
>>>
>>> tests: 1
>>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>>
>>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>> ---------------- --------------------------
>>> %stddev change %stddev
>>> \ | \
>>> 237096 14% 270789 will-it-scale.workload
>>> 823 14% 939 will-it-scale.per_thread_ops
>>>
>>>
>>> tests: 1
>>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>>
>>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>> ---------------- --------------------------
>>> %stddev change %stddev
>>> \ | \
>>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>>> 186 40% 261 will-it-scale.per_thread_ops
>>> 53909 40% 75507 will-it-scale.workload
>>>
>>>
>>> tests: 1
>>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>>>
>>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>> ---------------- --------------------------
>>> %stddev change %stddev
>>> \ | \
>>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>>> 226995 19% 269751 will-it-scale.workload
>>> 787 19% 936 will-it-scale.per_thread_ops
>>>
>>>
>>>
>>> commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>>> Author: 0day robot <[email protected]>
>>> Date: Wed Feb 20 14:21:19 2019 +0800
>>>
>>> backfile klist_node in struct device for debugging
>>>
>>> Signed-off-by: 0day robot <[email protected]>
>>>
>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>> index d0e452fd0bff2..31666cb72b3ba 100644
>>> --- a/include/linux/device.h
>>> +++ b/include/linux/device.h
>>> @@ -1035,6 +1035,7 @@ struct device {
>>> spinlock_t devres_lock;
>>> struct list_head devres_head;
>>>
>>> + struct klist_node knode_class_test_by_rongc;
>>> struct class *class;
>>> const struct attribute_group **groups; /* optional groups */
>>
>> While this is fun to worry about alignment and structure size of 'struct
>> device' I find it odd given that the syscalls and userspace load of
>> those test programs have nothing to do with 'struct device' at all.
>>
>> So I can work on fixing up the alignment of struct device, as that's a
>> nice thing to do for systems with 30k of these in memory, but that
>> shouldn't affect a workload of a constant string of signal calls.
>
>Hi, Greg,
>
>I don't think this is an issues of struct device. As you said, struct
>device isn't access much during test. Struct device may share slab page
>with some other data structures (signal related, or fd related (as in
>some other test cases)), so that the alignment of these data structures
>are affected, so caused the performance regression.
>

I didn't get the point here neither.

slab allocator ask memory from page allocator Page by Page and split the page
into pre-defined size. For example, 128B, 512B... Just as shown in
/proc/slabinfo.

Per my understanding, each struct device / device_private will sits in its own
aligned space. struct device would sits in 1K slab and struct device_private
would sits in 256B slab, both before and after this patch if I am correct.

Hmm... I am just curious about how this alignment is affected. Maybe I lost
some point?

>Best Regards,
>Huang, Ying
>
>> thanks,
>>
>> greg k-h

--
Wei Yang
Help you, Help me

2019-02-21 08:31:12

by Huang, Ying

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

Greg Kroah-Hartman <[email protected]> writes:

> On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote:
>> Greg Kroah-Hartman <[email protected]> writes:
>>
>> > On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>> >> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>> >> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>> >> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>> >> > > >Greeting,
>> >> > > >
>> >> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>> >> > > >
>> >> > > >
>> >> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>> >> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> >> > > >
>> >> > >
>> >> > > This is interesting.
>> >> > >
>> >> > > I didn't expect the move of this field will impact the performance.
>> >> > >
>> >> > > The reason is struct device is a hotter memory than device->device_private?
>> >> > >
>> >> > > >in testcase: will-it-scale
>> >> > > >on test machine: 288 threads Knights Mill with 80G memory
>> >> > > >with following parameters:
>> >> > > >
>> >> > > > nr_task: 100%
>> >> > > > mode: thread
>> >> > > > test: unlink2
>> >> > > > cpufreq_governor: performance
>> >> > > >
>> >> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>> >> > > >test-url: https://github.com/antonblanchard/will-it-scale
>> >> > > >
>> >> > > >In addition to that, the commit also has significant impact on the following tests:
>> >> > > >
>> >> > > >+------------------+---------------------------------------------------------------+
>> >> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>> >> > > >| test machine | 288 threads Knights Mill with 80G memory |
>> >> > > >| test parameters | cpufreq_governor=performance |
>> >> > > >| | mode=thread |
>> >> > > >| | nr_task=100% |
>> >> > > >| | test=signal1 |
>> >> >
>> >> > Ok, I'm going to blame your testing system, or something here, and not
>> >> > the above patch.
>> >> >
>> >> > All this test does is call raise(3). That does not touch the driver
>> >> > core at all.
>> >> >
>> >> > > >+------------------+---------------------------------------------------------------+
>> >> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>> >> > > >| test machine | 288 threads Knights Mill with 80G memory |
>> >> > > >| test parameters | cpufreq_governor=performance |
>> >> > > >| | mode=thread |
>> >> > > >| | nr_task=100% |
>> >> > > >| | test=open1 |
>> >> > > >+------------------+---------------------------------------------------------------+
>> >> >
>> >> > Same here, open1 just calls open/close a lot. No driver core
>> >> > interaction at all there either.
>> >> >
>> >> > So are you _sure_ this is the offending patch?
>> >>
>> >> Hi Greg,
>> >>
>> >> We did an experiment, recovered the layout of struct device. and we
>> >> found the regression is gone. I guess the regession is not from the
>> >> patch but related to the struct layout.
>> >>
>> >>
>> >> tests: 1
>> >> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>> >>
>> >> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>> >> ---------------- --------------------------
>> >> %stddev change %stddev
>> >> \ | \
>> >> 237096 14% 270789 will-it-scale.workload
>> >> 823 14% 939 will-it-scale.per_thread_ops
>> >>
>> >>
>> >> tests: 1
>> >> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>> >>
>> >> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>> >> ---------------- --------------------------
>> >> %stddev change %stddev
>> >> \ | \
>> >> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>> >> 186 40% 261 will-it-scale.per_thread_ops
>> >> 53909 40% 75507 will-it-scale.workload
>> >>
>> >>
>> >> tests: 1
>> >> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>> >>
>> >> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>> >> ---------------- --------------------------
>> >> %stddev change %stddev
>> >> \ | \
>> >> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>> >> 226995 19% 269751 will-it-scale.workload
>> >> 787 19% 936 will-it-scale.per_thread_ops
>> >>
>> >>
>> >>
>> >> commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>> >> Author: 0day robot <[email protected]>
>> >> Date: Wed Feb 20 14:21:19 2019 +0800
>> >>
>> >> backfile klist_node in struct device for debugging
>> >>
>> >> Signed-off-by: 0day robot <[email protected]>
>> >>
>> >> diff --git a/include/linux/device.h b/include/linux/device.h
>> >> index d0e452fd0bff2..31666cb72b3ba 100644
>> >> --- a/include/linux/device.h
>> >> +++ b/include/linux/device.h
>> >> @@ -1035,6 +1035,7 @@ struct device {
>> >> spinlock_t devres_lock;
>> >> struct list_head devres_head;
>> >>
>> >> + struct klist_node knode_class_test_by_rongc;
>> >> struct class *class;
>> >> const struct attribute_group **groups; /* optional groups */
>> >
>> > While this is fun to worry about alignment and structure size of 'struct
>> > device' I find it odd given that the syscalls and userspace load of
>> > those test programs have nothing to do with 'struct device' at all.
>> >
>> > So I can work on fixing up the alignment of struct device, as that's a
>> > nice thing to do for systems with 30k of these in memory, but that
>> > shouldn't affect a workload of a constant string of signal calls.
>>
>> Hi, Greg,
>>
>> I don't think this is an issues of struct device. As you said, struct
>> device isn't access much during test. Struct device may share slab page
>> with some other data structures (signal related, or fd related (as in
>> some other test cases)), so that the alignment of these data structures
>> are affected, so caused the performance regression.
>
> But allocation of a structure should always be "properly" aligned, no
> matter what something else did in the system as that is what kmalloc
> ensures. If not, then we have problems in our memory allocator :)
>
> So something is odd here, but I don't think that is it...

If all these data structure are allocated with kmalloc() instead of
kmem_cache_alloc(), then my guessing above seems incorrect ...

Best Regards,
Huang, Ying

2019-02-21 22:33:26

by Wei Yang

[permalink] [raw]

Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

On Thu, Feb 21, 2019 at 03:53:13PM +0800, Wei Yang wrote:
>On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote:
>>Greg Kroah-Hartman <[email protected]> writes:
>>
>>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>>> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>>> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>>> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>>> > > >Greeting,
>>>> > > >
>>>> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit:
>>>> > > >
>>>> > > >
>>>> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private")
>>>> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>> > > >
>>>> > >
>>>> > > This is interesting.
>>>> > >
>>>> > > I didn't expect the move of this field will impact the performance.
>>>> > >
>>>> > > The reason is struct device is a hotter memory than device->device_private?
>>>> > >
>>>> > > >in testcase: will-it-scale
>>>> > > >on test machine: 288 threads Knights Mill with 80G memory
>>>> > > >with following parameters:
>>>> > > >
>>>> > > > nr_task: 100%
>>>> > > > mode: thread
>>>> > > > test: unlink2
>>>> > > > cpufreq_governor: performance
>>>> > > >
>>>> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>>> > > >test-url: https://github.com/antonblanchard/will-it-scale
>>>> > > >
>>>> > > >In addition to that, the commit also has significant impact on the following tests:
>>>> > > >
>>>> > > >+------------------+---------------------------------------------------------------+
>>>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression |
>>>> > > >| test machine | 288 threads Knights Mill with 80G memory |
>>>> > > >| test parameters | cpufreq_governor=performance |
>>>> > > >| | mode=thread |
>>>> > > >| | nr_task=100% |
>>>> > > >| | test=signal1 |
>>>> >
>>>> > Ok, I'm going to blame your testing system, or something here, and not
>>>> > the above patch.
>>>> >
>>>> > All this test does is call raise(3). That does not touch the driver
>>>> > core at all.
>>>> >
>>>> > > >+------------------+---------------------------------------------------------------+
>>>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression |
>>>> > > >| test machine | 288 threads Knights Mill with 80G memory |
>>>> > > >| test parameters | cpufreq_governor=performance |
>>>> > > >| | mode=thread |
>>>> > > >| | nr_task=100% |
>>>> > > >| | test=open1 |
>>>> > > >+------------------+---------------------------------------------------------------+
>>>> >
>>>> > Same here, open1 just calls open/close a lot. No driver core
>>>> > interaction at all there either.
>>>> >
>>>> > So are you _sure_ this is the offending patch?
>>>>
>>>> Hi Greg,
>>>>
>>>> We did an experiment, recovered the layout of struct device. and we
>>>> found the regression is gone. I guess the regession is not from the
>>>> patch but related to the struct layout.
>>>>
>>>>
>>>> tests: 1
>>>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>>>
>>>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>> ---------------- --------------------------
>>>> %stddev change %stddev
>>>> \ | \
>>>> 237096 14% 270789 will-it-scale.workload
>>>> 823 14% 939 will-it-scale.per_thread_ops
>>>>
>>>>
>>>> tests: 1
>>>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>>>
>>>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>> ---------------- --------------------------
>>>> %stddev change %stddev
>>>> \ | \
>>>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time
>>>> 186 40% 261 will-it-scale.per_thread_ops
>>>> 53909 40% 75507 will-it-scale.workload
>>>>
>>>>
>>>> tests: 1
>>>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01
>>>>
>>>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f
>>>> ---------------- --------------------------
>>>> %stddev change %stddev
>>>> \ | \
>>>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches
>>>> 226995 19% 269751 will-it-scale.workload
>>>> 787 19% 936 will-it-scale.per_thread_ops
>>>>
>>>>
>>>>
>>>> commit a36dc70b810afe9183de2ea18faa4c0939c139ac
>>>> Author: 0day robot <[email protected]>
>>>> Date: Wed Feb 20 14:21:19 2019 +0800
>>>>
>>>> backfile klist_node in struct device for debugging
>>>>
>>>> Signed-off-by: 0day robot <[email protected]>
>>>>
>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>> index d0e452fd0bff2..31666cb72b3ba 100644
>>>> --- a/include/linux/device.h
>>>> +++ b/include/linux/device.h
>>>> @@ -1035,6 +1035,7 @@ struct device {
>>>> spinlock_t devres_lock;
>>>> struct list_head devres_head;
>>>>
>>>> + struct klist_node knode_class_test_by_rongc;
>>>> struct class *class;
>>>> const struct attribute_group **groups; /* optional groups */
>>>
>>> While this is fun to worry about alignment and structure size of 'struct
>>> device' I find it odd given that the syscalls and userspace load of
>>> those test programs have nothing to do with 'struct device' at all.
>>>
>>> So I can work on fixing up the alignment of struct device, as that's a
>>> nice thing to do for systems with 30k of these in memory, but that
>>> shouldn't affect a workload of a constant string of signal calls.
>>
>>Hi, Greg,
>>
>>I don't think this is an issues of struct device. As you said, struct
>>device isn't access much during test. Struct device may share slab page
>>with some other data structures (signal related, or fd related (as in
>>some other test cases)), so that the alignment of these data structures
>>are affected, so caused the performance regression.
>>
>
>I didn't get the point here neither.
>
>slab allocator ask memory from page allocator Page by Page and split the page
>into pre-defined size. For example, 128B, 512B... Just as shown in
>/proc/slabinfo.
>
>Per my understanding, each struct device / device_private will sits in its own
>aligned space. struct device would sits in 1K slab and struct device_private
>would sits in 256B slab, both before and after this patch if I am correct.
>

As Greg mentioned, device is embedded in other structure. My analysis
here is not correct. The change in size of device, may affect the size
of struct who wraps device.

In case this struct is allocated by kmem_cache, this may affect the
number of objects allocated each time, or even the number of pages
allocated each time.

This means with this patch may affect the system in the following two
aspects:

* the times for struct allocation
* the order of page it asks from page allocator

One place we may take a look is the /proc/slabinfo. To see whether the
change in size of device affect kmem_cach objects.

Hmm... while as Greg mentioned, those cases will not involve struct
device allocation. So the above aspects may not take effect?

>Hmm... I am just curious about how this alignment is affected. Maybe I lost
>some point?
>
>>Best Regards,
>>Huang, Ying
>>
>>> thanks,
>>>
>>> greg k-h
>
>--
>Wei Yang
>Help you, Help me

--
Wei Yang
Help you, Help me