LinuxLists.cc - [perf x86] b77491648e: will-it-scale.per_process

2020-02-05 12:34:05

Subject: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

Greeting,

FYI, we noticed a

commit: b77491648e6eb2f26b

If you fix the issue, Reported-by: kernel

Details are as below:
--------------------------

To reproduce:

git clone bin/lkp install job.yaml bin/lkp run job.yaml

========================== compiler/cpufreq_governor/ gcc-7/performance/x86_64
commit:
v5.4
b77491648e ("perf
v5.4 ---------------- %stddev %change \ | 47986 4222852 427194 ± 9% 12.88 ± 2% 8846 ± 10% +23.9% 14442 ± 4% 78696 ± 9% +14.4% 78411 ± 9% +14.5% 9.77 ± 4% +15.0% 9.77 ± 4% +15.0% 4.072e+09 44948352 35.25 +4.3 12569960 35888855 ± 2% 11.75 19377 27157347 6.739e+09 27809165 5.461e+09 2.072e+10 0.09 -1.7% 917994 96.93 -1.1 5499191 169716 ± 8% 1.73 ± 2% -4.4% 35.03 +4.2 11.77 19401 0.08 -1.8% 97.01 -1.1 4.058e+09 44798305 12526500 35771706 ± 2% 27063288 6.716e+09 27712662 5.442e+09 2.065e+10 914841 5480102 169148 ± 8% 6.242e+12 481.50 ± 26% 772.75 ± 63% 481.50 ± 26% 954.25 ± 10% 932.50 ± 48% 583.75 ± 59% 513.00 ± 42% +145.8% 256.25 ± 40% +253.9% 475.25 ± 19% +133.5% 734.50 ± 36% +99.1% 905.75 ± 48% 363.00 ± 35% +114.3% 6915 ± 24% -29.1% 6915 ± 24% -29.1% 436.50 ± 48% +166.7% 6950 ± 24% -29.1% 6950 ± 24% -29.1% 6906 ± 24% -28.9% 6906 ± 24% -28.9% 216.00 ± 70% -76.6% 2607 ± 47% +51.4% 3220 ± 10% +22.4% 4914 ± 34% +59.9% 4914 ± 34% +59.9% 4937 ± 34% +59.7% 4937 ± 34% +59.7% 4919 ± 34% +59.6% 4919 ± 34% +59.6% 4925 ± 34% +59.9% 4925 ± 34% +59.9% 4906 ± 33% +60.3% 4906 ± 33% +60.3% 890.00 ± 75% 135.00 ± 52% +911.7% 110.25 ± 14% +388.7% 3285 ± 3% +15.4% 186.50 ± 60% +274.4% 1.22 ± 2% -0.2 3.95 -0.2 4.07 -0.2 1.93 -0.1 0.66 -0.1 0.65 ± 2% -0.1 0.85 -0.1 1.03 -0.0 0.81 -0.0 0.98 -0.0 1.10 -0.0 0.52 +0.0 1.87 +0.1 23.79 +0.3 24.02 +0.3 89.84 +0.3 89.46 +0.3 36.84 +0.4 36.43 +0.4 25.64 +0.4 25.58 +0.5 25.35 +0.5 24.66 +0.5 30.40 +0.6 31.58 +0.6 29.13 +0.8 28.90 +0.8 3.46 -0.2 3.95 -0.2 4.42 -0.2 1.93 -0.1 3.62 -0.1 0.27 -0.1 0.84 -0.0 1.03 -0.0 0.34 -0.0 0.73 -0.0 1.11 -0.0 0.37 ± 2% -0.0 0.27 -0.0 1.89 +0.1 0.29 ± 7% +0.2 0.29 ± 7% +0.2 24.03 +0.3 23.80 +0.3 90.00 +0.3 89.60 +0.3 36.86 +0.4 36.45 +0.4 25.65 +0.4 25.59 +0.5 25.36 +0.5 24.67 +0.5 30.41 +0.6 31.60 +0.6 29.14 +0.8 28.90 +0.8 19.11 -0.4 2.58 -0.2 3.95 -0.2 4.41 -0.2 0.95 -0.1 0.25 -0.1 0.15 ± 2% -0.0 0.74 -0.0 0.34 -0.0 0.46 ± 2% -0.0 0.36 ± 3% -0.0 0.26 -0.0 1.10 +0.0 0.28 ± 6% +0.2 23.71 +0.3 12.65 +0.6

52000 +-+--------------- |.. 51000 +-++.+..+.+ 50000 +-+ : | : 49000 +-+ : | 48000 +-+ | 47000 +-+ 46000 +-+ | 45000 +-+ O O O O O O O 44000 +-+--------------- [*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results have been for informational design or configuration

Thanks,
Rong Chen

-2.1% regression of will-it-scale.per_process_ops due to commit:
6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
om/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508">https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
//github.com/antonblanchard/will-it-scale">https://github.com/antonblanchard/will-it-scale
kindly add following tag
test robot <[email protected]>
------------------------------------------------------------------------>
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
# job file is attached in this email
===============================================================
kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-bdw-ep6/signal1/will-it-scale/0xb000038
x86: Infrastructure for exposing an Uncore unit to PMON mapping")
b77491648e6eb2f26b6edf5eaea
---------------------------
%stddev
\
-2.1% 46989 will-it-scale.per_process_ops
-2.1% 4135110 will-it-scale.workload
+13.8% 486344 ± 4% numa-vmstat.node1.numa_local
-8.5% 11.79 ± 4% turbostat.RAMWatt
10964 ± 9% softirqs.CPU0.SCHED
-5.2% 13697 ± 5% softirqs.CPU71.RCU
89993 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev
89817 ± 8% sched_debug.cfs_rq:/.spread0.stddev
11.23 ± 3% sched_debug.cpu.clock.stddev
11.23 ± 3% sched_debug.cpu.clock_task.stddev
-1.9% 3.996e+09 perf-stat.i.branch-instructions
-1.8% 44159252 perf-stat.i.branch-misses
39.56 perf-stat.i.cache-miss-rate%
+5.2% 13223444 perf-stat.i.cache-misses
-6.2% 33680305 ± 2% perf-stat.i.cache-references
+1.8% 11.96 perf-stat.i.cpi
-5.0% 18403 perf-stat.i.cycles-between-cache-misses
-2.1% 26595986 perf-stat.i.dTLB-load-misses
-2.0% 6.602e+09 perf-stat.i.dTLB-loads
-1.9% 27268405 perf-stat.i.dTLB-store-misses
-1.9% 5.356e+09 perf-stat.i.dTLB-stores
-1.9% 2.034e+10 perf-stat.i.instructions
0.08 perf-stat.i.ipc
+2.6% 941599 perf-stat.i.node-load-misses
95.81 perf-stat.i.node-store-miss-rate%
+5.0% 5774707 perf-stat.i.node-store-misses
+45.2% 246479 ± 6% perf-stat.i.node-stores
1.66 ± 2% perf-stat.overall.MPKI
39.27 perf-stat.overall.cache-miss-rate%
+1.8% 11.98 perf-stat.overall.cpi
-5.0% 18428 perf-stat.overall.cycles-between-cache-misses
0.08 perf-stat.overall.ipc
95.91 perf-stat.overall.node-store-miss-rate%
-1.8% 3.983e+09 perf-stat.ps.branch-instructions
-1.7% 44014351 perf-stat.ps.branch-misses
+5.2% 13178368 perf-stat.ps.cache-misses
-6.2% 33569906 ± 2% perf-stat.ps.cache-references
-2.1% 26505363 perf-stat.ps.dTLB-load-misses
-2.0% 6.58e+09 perf-stat.ps.dTLB-loads
-1.9% 27175399 perf-stat.ps.dTLB-store-misses
-1.9% 5.338e+09 perf-stat.ps.dTLB-stores
-1.9% 2.027e+10 perf-stat.ps.instructions
+2.6% 938399 perf-stat.ps.node-load-misses
+5.0% 5754996 perf-stat.ps.node-store-misses
+45.2% 245649 ± 6% perf-stat.ps.node-stores
-1.6% 6.142e+12 perf-stat.total.instructions
-41.7% 280.75 ± 28% interrupts.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3
-70.0% 231.75 ± 28% interrupts.CPU1.RES:Rescheduling_interrupts
-41.7% 280.75 ± 28% interrupts.CPU16.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3
-71.8% 269.50 ± 76% interrupts.CPU19.RES:Rescheduling_interrupts
-68.4% 294.75 ± 72% interrupts.CPU20.RES:Rescheduling_interrupts
-79.5% 119.75 ± 54% interrupts.CPU21.RES:Rescheduling_interrupts
1261 ± 17% interrupts.CPU22.RES:Rescheduling_interrupts
906.75 ± 39% interrupts.CPU24.RES:Rescheduling_interrupts
1109 ± 41% interrupts.CPU26.RES:Rescheduling_interrupts
1462 ± 26% interrupts.CPU27.RES:Rescheduling_interrupts
-64.9% 318.00 ± 85% interrupts.CPU3.RES:Rescheduling_interrupts
777.75 ± 26% interrupts.CPU30.RES:Rescheduling_interrupts
4904 ± 34% interrupts.CPU37.NMI:Non-maskable_interrupts
4904 ± 34% interrupts.CPU37.PMI:Performance_monitoring_interrupts
1164 ± 41% interrupts.CPU38.RES:Rescheduling_interrupts
4926 ± 34% interrupts.CPU39.NMI:Non-maskable_interrupts
4926 ± 34% interrupts.CPU39.PMI:Performance_monitoring_interrupts
4910 ± 35% interrupts.CPU41.NMI:Non-maskable_interrupts
4910 ± 35% interrupts.CPU41.PMI:Performance_monitoring_interrupts
50.50 ± 22% interrupts.CPU46.RES:Rescheduling_interrupts
3948 ± 8% interrupts.CPU50.CAL:Function_call_interrupts
3940 ± 8% interrupts.CPU51.CAL:Function_call_interrupts
7855 interrupts.CPU56.NMI:Non-maskable_interrupts
7855 interrupts.CPU56.PMI:Performance_monitoring_interrupts
7885 interrupts.CPU58.NMI:Non-maskable_interrupts
7885 interrupts.CPU58.PMI:Performance_monitoring_interrupts
7849 interrupts.CPU59.NMI:Non-maskable_interrupts
7849 interrupts.CPU59.PMI:Performance_monitoring_interrupts
7878 interrupts.CPU61.NMI:Non-maskable_interrupts
7878 interrupts.CPU61.PMI:Performance_monitoring_interrupts
7867 interrupts.CPU63.NMI:Non-maskable_interrupts
7867 interrupts.CPU63.PMI:Performance_monitoring_interrupts
-82.0% 160.00 ± 46% interrupts.CPU63.RES:Rescheduling_interrupts
1365 ± 76% interrupts.CPU70.RES:Rescheduling_interrupts
538.75 ± 30% interrupts.CPU71.RES:Rescheduling_interrupts
3791 ± 3% interrupts.CPU73.CAL:Function_call_interrupts
698.25 ± 77% interrupts.CPU81.RES:Rescheduling_interrupts
1.02 perf-profile.calltrace.cycles-pp.recalc_sigpending.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop
3.79 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
3.92 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
1.79 perf-profile.calltrace.cycles-pp.fpu__clear.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.59 ± 3% perf-profile.calltrace.cycles-pp.__set_task_blocked.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.57 ± 2% perf-profile.calltrace.cycles-pp.recalc_sigpending.__set_task_blocked.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64
0.79 ± 2% perf-profile.calltrace.cycles-pp.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.98 perf-profile.calltrace.cycles-pp.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.76 ± 2% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.fpu__clear.do_signal.exit_to_usermode_loop.do_syscall_64
0.94 perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64
1.07 perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.55 ± 3% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.__fpu__restore_sig.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.95 perf-profile.calltrace.cycles-pp.__fpu__restore_sig.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
24.06 perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal.do_send_sig_info.do_send_specific.do_tkill
24.29 perf-profile.calltrace.cycles-pp.__send_signal.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill
90.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
89.78 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.20 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.80 perf-profile.calltrace.cycles-pp.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
26.09 perf-profile.calltrace.cycles-pp.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
26.04 perf-profile.calltrace.cycles-pp.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.82 perf-profile.calltrace.cycles-pp.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.18 perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64
30.97 perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64
32.18 perf-profile.calltrace.cycles-pp.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.91 perf-profile.calltrace.cycles-pp.__dequeue_signal.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop
29.68 perf-profile.calltrace.cycles-pp.__sigqueue_free.__dequeue_signal.dequeue_signal.get_signal.do_signal
3.21 ± 2% perf-profile.children.cycles-pp.recalc_sigpending
3.79 perf-profile.children.cycles-pp.entry_SYSCALL_64
4.26 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.80 ± 2% perf-profile.children.cycles-pp.fpu__clear
3.54 perf-profile.children.cycles-pp.__set_current_blocked
0.21 ± 3% perf-profile.children.cycles-pp.fpregs_assert_state_consistent
0.79 ± 2% perf-profile.children.cycles-pp._copy_from_user
0.99 perf-profile.children.cycles-pp.signal_setup_done
0.30 ± 5% perf-profile.children.cycles-pp.restore_altstack
0.70 perf-profile.children.cycles-pp.__might_fault
1.08 perf-profile.children.cycles-pp.copy_fpstate_to_sigframe
0.35 perf-profile.children.cycles-pp.___might_sleep
0.26 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
1.96 perf-profile.children.cycles-pp.__fpu__restore_sig
0.53 ± 6% perf-profile.children.cycles-pp.__lock_task_sighand
0.53 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
24.29 perf-profile.children.cycles-pp.__send_signal
24.06 perf-profile.children.cycles-pp.__sigqueue_alloc
90.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
89.92 perf-profile.children.cycles-pp.do_syscall_64
37.22 perf-profile.children.cycles-pp.exit_to_usermode_loop
36.81 perf-profile.children.cycles-pp.do_signal
26.10 perf-profile.children.cycles-pp.__x64_sys_tgkill
26.04 perf-profile.children.cycles-pp.do_tkill
25.82 perf-profile.children.cycles-pp.do_send_specific
25.19 perf-profile.children.cycles-pp.do_send_sig_info
30.98 perf-profile.children.cycles-pp.dequeue_signal
32.20 perf-profile.children.cycles-pp.get_signal
29.92 perf-profile.children.cycles-pp.__dequeue_signal
29.69 perf-profile.children.cycles-pp.__sigqueue_free
18.75 perf-profile.self.cycles-pp.do_syscall_64
2.34 perf-profile.self.cycles-pp.recalc_sigpending
3.79 perf-profile.self.cycles-pp.entry_SYSCALL_64
4.25 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.86 ± 2% perf-profile.self.cycles-pp.fpu__clear
0.19 ± 2% perf-profile.self.cycles-pp.fpregs_assert_state_consistent
0.12 ± 6% perf-profile.self.cycles-pp._copy_from_user
0.71 perf-profile.self.cycles-pp.copy_fpstate_to_sigframe
0.31 perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask
0.44 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.34 perf-profile.self.cycles-pp.___might_sleep
0.24 ± 2% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
1.15 perf-profile.self.cycles-pp.__fpu__restore_sig
0.53 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
23.96 perf-profile.self.cycles-pp.__sigqueue_alloc
13.24 perf-profile.self.cycles-pp.__sigqueue_free

will-it-scale.per_process_ops

--------------------------------------------------+
|
|
|
|
|
+..+. .+.+..+.+..+..+.+..+.+..+.. |
+..+. +.+..+..+.+..+.+..|
O O O O O O |
O O O O O O |
|
|
|
O O O O |
--------------------------------------------------+

estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.

Attachments:

(No filename) (19.16 kB)
config-5.4.0-00001-gb77491648e6eb (203.97 kB)
job-script (7.88 kB)
job.yaml (5.50 kB)
reproduce (321.00 B)
Download all attachments

2020-02-05 20:48:16

by Andi Kleen

[permalink] [raw]

Subject: Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

kernel test robot <[email protected]> writes:

> Greeting,
>
> FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
> https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508

Seems to be spurious bisect. I don't think that commit could change
anything performance related.

-Andi

2020-02-06 03:21:39

by Philip Li

[permalink] [raw]

Subject: RE: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

> Subject: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1%
> regression
>
> kernel test robot <[email protected]> writes:
>
> > Greeting,
> >
> > FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to
> commit:
> >
> >
> > commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure
> for exposing an Uncore unit to PMON mapping")
> > https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-
> x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
>
> Seems to be spurious bisect. I don't think that commit could change
> anything performance related.
Hi Andi, we will look into this as early as possible, we also receive another input from
Pater Z that he got false positive of will-it-scale.per_process_ops performance
regression. We will investigate them.

>
> -Andi
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-02-12 10:58:32

by Chen, Rong A

[permalink] [raw]

Subject: Re: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

On 2/6/2020 4:47 AM, Andi Kleen wrote:
> kernel test robot <[email protected]> writes:
>
>> Greeting,
>>
>> FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to commit:
>>
>>
>> commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
>> https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
> Seems to be spurious bisect. I don't think that commit could change
> anything performance related.

Hi Andi,

I commented out some lines in arch/x86/events/intel/uncore.c and
will-it-scale.per_process_ops increased.

commit:
v5.4
b77491648e ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
f33fe1b258 ("test")

            v5.4 b77491648e6eb2f26b6edf5eae
f33fe1b258b2a4b2fc97600b2b testcase/testparams/testbox
---------------- -------------------------- --------------------------
---------------------------
         %stddev      change         %stddev      change %stddev
             \          |                \          | \
     47983                       47004 47647
will-it-scale/performance-process-100%-signal1-ucode=0xb000038/lkp-bdw-ep6
     47983                       47004 47647        GEO-MEAN
will-it-scale.per_process_ops

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 55201bfde2c84c..0dc9c455423d99 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -887,7 +887,7 @@ static int uncore_pmu_register(struct
intel_uncore_pmu *pmu)
                pmu->pmu.attr_groups = pmu->type->attr_groups;
        }

-       pmu->pmu.attr_update = attr_update;
+       // pmu->pmu.attr_update = attr_update;

        if (pmu->type->num_boxes == 1) {
                if (strlen(pmu->type->name) > 0)
@@ -903,7 +903,7 @@ static int uncore_pmu_register(struct
intel_uncore_pmu *pmu)
         * Exposing mapping of Uncore units to corresponding Uncore PMUs
         * through /sys/devices/uncore_<type>_<idx>/mapping
         */
-       uncore_platform_mapping(pmu->type);
+       // uncore_platform_mapping(pmu->type);

        ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
        if (!ret)

Best Regards,
Rong Chen

>
> -Andi
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-02-12 15:16:19

by Liang, Kan

[permalink] [raw]

Subject: Re: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

On 2/12/2020 5:56 AM, Chen, Rong A wrote:
>
>
> On 2/6/2020 4:47 AM, Andi Kleen wrote:
>> kernel test robot <[email protected]> writes:
>>
>>> Greeting,
>>>
>>> FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops
>>> due to commit:
>>>
>>>
>>> commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86:
>>> Infrastructure for exposing an Uncore unit to PMON mapping")
>>> https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
>>>
>> Seems to be spurious bisect. I don't think that commit could change
>> anything performance related.
>
> Hi Andi,
>
> I commented out some lines in arch/x86/events/intel/uncore.c and
> will-it-scale.per_process_ops increased.
>
> commit:
> v5.4
> b77491648e ("perf x86: Infrastructure for exposing an Uncore unit to
> PMON mapping")
> f33fe1b258 ("test")
>
>
>             v5.4 b77491648e6eb2f26b6edf5eae
> f33fe1b258b2a4b2fc97600b2b testcase/testparams/testbox
> ---------------- -------------------------- --------------------------
> ---------------------------
>          %stddev      change         %stddev      change %stddev
>              \          |                \          | \
>      47983                       47004 47647
> will-it-scale/performance-process-100%-signal1-ucode=0xb000038/lkp-bdw-ep6
>      47983                       47004 47647        GEO-MEAN
> will-it-scale.per_process_ops
>
> diff --git a/arch/x86/events/intel/uncore.c
> b/arch/x86/events/intel/uncore.c
> index 55201bfde2c84c..0dc9c455423d99 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -887,7 +887,7 @@ static int uncore_pmu_register(struct
> intel_uncore_pmu *pmu)
>                 pmu->pmu.attr_groups = pmu->type->attr_groups;
>         }
>
> -       pmu->pmu.attr_update = attr_update;
> +       // pmu->pmu.attr_update = attr_update;
>
>         if (pmu->type->num_boxes == 1) {
>                 if (strlen(pmu->type->name) > 0)
> @@ -903,7 +903,7 @@ static int uncore_pmu_register(struct
> intel_uncore_pmu *pmu)
>          * Exposing mapping of Uncore units to corresponding Uncore PMUs
>          * through /sys/devices/uncore_<type>_<idx>/mapping
>          */
> -       uncore_platform_mapping(pmu->type);
> +       // uncore_platform_mapping(pmu->type);

The patch is for SKX uncore. The test machine looks like a BDX.
So the mapping_group should always be invisible.
The attr_update should not update.
I think there should be no performance impact.

static void uncore_platform_mapping(struct intel_uncore_type *t)
{
if (t->get_topology && t->set_mapping &&
!t->get_topology(t, max_dies) && !t->set_mapping(t, max_dies))
mapping_group.is_visible = NULL;
else
mapping_group.is_visible = not_visible;
}

Kan

>
>         ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
>         if (!ret)
>
> Best Regards,
> Rong Chen
>
>>
>> -Andi
>> _______________________________________________
>> LKP mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>