2020-02-05 12:34:05

by Chen, Rong A

[permalink] [raw]
Subject: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

Greeting,

FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to commit:


commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508

in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
with following parameters:

nr_task: 100%
mode: process
test: signal1
cpufreq_governor: performance
ucode: 0xb000038

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-bdw-ep6/signal1/will-it-scale/0xb000038

commit:
v5.4
b77491648e ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")

v5.4 b77491648e6eb2f26b6edf5eaea
---------------- ---------------------------
%stddev %change %stddev
\ | \
47986 -2.1% 46989 will-it-scale.per_process_ops
4222852 -2.1% 4135110 will-it-scale.workload
427194 ± 9% +13.8% 486344 ± 4% numa-vmstat.node1.numa_local
12.88 ± 2% -8.5% 11.79 ± 4% turbostat.RAMWatt
8846 ± 10% +23.9% 10964 ± 9% softirqs.CPU0.SCHED
14442 ± 4% -5.2% 13697 ± 5% softirqs.CPU71.RCU
78696 ± 9% +14.4% 89993 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev
78411 ± 9% +14.5% 89817 ± 8% sched_debug.cfs_rq:/.spread0.stddev
9.77 ± 4% +15.0% 11.23 ± 3% sched_debug.cpu.clock.stddev
9.77 ± 4% +15.0% 11.23 ± 3% sched_debug.cpu.clock_task.stddev
4.072e+09 -1.9% 3.996e+09 perf-stat.i.branch-instructions
44948352 -1.8% 44159252 perf-stat.i.branch-misses
35.25 +4.3 39.56 perf-stat.i.cache-miss-rate%
12569960 +5.2% 13223444 perf-stat.i.cache-misses
35888855 ± 2% -6.2% 33680305 ± 2% perf-stat.i.cache-references
11.75 +1.8% 11.96 perf-stat.i.cpi
19377 -5.0% 18403 perf-stat.i.cycles-between-cache-misses
27157347 -2.1% 26595986 perf-stat.i.dTLB-load-misses
6.739e+09 -2.0% 6.602e+09 perf-stat.i.dTLB-loads
27809165 -1.9% 27268405 perf-stat.i.dTLB-store-misses
5.461e+09 -1.9% 5.356e+09 perf-stat.i.dTLB-stores
2.072e+10 -1.9% 2.034e+10 perf-stat.i.instructions
0.09 -1.7% 0.08 perf-stat.i.ipc
917994 +2.6% 941599 perf-stat.i.node-load-misses
96.93 -1.1 95.81 perf-stat.i.node-store-miss-rate%
5499191 +5.0% 5774707 perf-stat.i.node-store-misses
169716 ± 8% +45.2% 246479 ± 6% perf-stat.i.node-stores
1.73 ± 2% -4.4% 1.66 ± 2% perf-stat.overall.MPKI
35.03 +4.2 39.27 perf-stat.overall.cache-miss-rate%
11.77 +1.8% 11.98 perf-stat.overall.cpi
19401 -5.0% 18428 perf-stat.overall.cycles-between-cache-misses
0.08 -1.8% 0.08 perf-stat.overall.ipc
97.01 -1.1 95.91 perf-stat.overall.node-store-miss-rate%
4.058e+09 -1.8% 3.983e+09 perf-stat.ps.branch-instructions
44798305 -1.7% 44014351 perf-stat.ps.branch-misses
12526500 +5.2% 13178368 perf-stat.ps.cache-misses
35771706 ± 2% -6.2% 33569906 ± 2% perf-stat.ps.cache-references
27063288 -2.1% 26505363 perf-stat.ps.dTLB-load-misses
6.716e+09 -2.0% 6.58e+09 perf-stat.ps.dTLB-loads
27712662 -1.9% 27175399 perf-stat.ps.dTLB-store-misses
5.442e+09 -1.9% 5.338e+09 perf-stat.ps.dTLB-stores
2.065e+10 -1.9% 2.027e+10 perf-stat.ps.instructions
914841 +2.6% 938399 perf-stat.ps.node-load-misses
5480102 +5.0% 5754996 perf-stat.ps.node-store-misses
169148 ± 8% +45.2% 245649 ± 6% perf-stat.ps.node-stores
6.242e+12 -1.6% 6.142e+12 perf-stat.total.instructions
481.50 ± 26% -41.7% 280.75 ± 28% interrupts.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3
772.75 ± 63% -70.0% 231.75 ± 28% interrupts.CPU1.RES:Rescheduling_interrupts
481.50 ± 26% -41.7% 280.75 ± 28% interrupts.CPU16.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3
954.25 ± 10% -71.8% 269.50 ± 76% interrupts.CPU19.RES:Rescheduling_interrupts
932.50 ± 48% -68.4% 294.75 ± 72% interrupts.CPU20.RES:Rescheduling_interrupts
583.75 ± 59% -79.5% 119.75 ± 54% interrupts.CPU21.RES:Rescheduling_interrupts
513.00 ± 42% +145.8% 1261 ± 17% interrupts.CPU22.RES:Rescheduling_interrupts
256.25 ± 40% +253.9% 906.75 ± 39% interrupts.CPU24.RES:Rescheduling_interrupts
475.25 ± 19% +133.5% 1109 ± 41% interrupts.CPU26.RES:Rescheduling_interrupts
734.50 ± 36% +99.1% 1462 ± 26% interrupts.CPU27.RES:Rescheduling_interrupts
905.75 ± 48% -64.9% 318.00 ± 85% interrupts.CPU3.RES:Rescheduling_interrupts
363.00 ± 35% +114.3% 777.75 ± 26% interrupts.CPU30.RES:Rescheduling_interrupts
6915 ± 24% -29.1% 4904 ± 34% interrupts.CPU37.NMI:Non-maskable_interrupts
6915 ± 24% -29.1% 4904 ± 34% interrupts.CPU37.PMI:Performance_monitoring_interrupts
436.50 ± 48% +166.7% 1164 ± 41% interrupts.CPU38.RES:Rescheduling_interrupts
6950 ± 24% -29.1% 4926 ± 34% interrupts.CPU39.NMI:Non-maskable_interrupts
6950 ± 24% -29.1% 4926 ± 34% interrupts.CPU39.PMI:Performance_monitoring_interrupts
6906 ± 24% -28.9% 4910 ± 35% interrupts.CPU41.NMI:Non-maskable_interrupts
6906 ± 24% -28.9% 4910 ± 35% interrupts.CPU41.PMI:Performance_monitoring_interrupts
216.00 ± 70% -76.6% 50.50 ± 22% interrupts.CPU46.RES:Rescheduling_interrupts
2607 ± 47% +51.4% 3948 ± 8% interrupts.CPU50.CAL:Function_call_interrupts
3220 ± 10% +22.4% 3940 ± 8% interrupts.CPU51.CAL:Function_call_interrupts
4914 ± 34% +59.9% 7855 interrupts.CPU56.NMI:Non-maskable_interrupts
4914 ± 34% +59.9% 7855 interrupts.CPU56.PMI:Performance_monitoring_interrupts
4937 ± 34% +59.7% 7885 interrupts.CPU58.NMI:Non-maskable_interrupts
4937 ± 34% +59.7% 7885 interrupts.CPU58.PMI:Performance_monitoring_interrupts
4919 ± 34% +59.6% 7849 interrupts.CPU59.NMI:Non-maskable_interrupts
4919 ± 34% +59.6% 7849 interrupts.CPU59.PMI:Performance_monitoring_interrupts
4925 ± 34% +59.9% 7878 interrupts.CPU61.NMI:Non-maskable_interrupts
4925 ± 34% +59.9% 7878 interrupts.CPU61.PMI:Performance_monitoring_interrupts
4906 ± 33% +60.3% 7867 interrupts.CPU63.NMI:Non-maskable_interrupts
4906 ± 33% +60.3% 7867 interrupts.CPU63.PMI:Performance_monitoring_interrupts
890.00 ± 75% -82.0% 160.00 ± 46% interrupts.CPU63.RES:Rescheduling_interrupts
135.00 ± 52% +911.7% 1365 ± 76% interrupts.CPU70.RES:Rescheduling_interrupts
110.25 ± 14% +388.7% 538.75 ± 30% interrupts.CPU71.RES:Rescheduling_interrupts
3285 ± 3% +15.4% 3791 ± 3% interrupts.CPU73.CAL:Function_call_interrupts
186.50 ± 60% +274.4% 698.25 ± 77% interrupts.CPU81.RES:Rescheduling_interrupts
1.22 ± 2% -0.2 1.02 perf-profile.calltrace.cycles-pp.recalc_sigpending.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop
3.95 -0.2 3.79 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
4.07 -0.2 3.92 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
1.93 -0.1 1.79 perf-profile.calltrace.cycles-pp.fpu__clear.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.66 -0.1 0.59 ± 3% perf-profile.calltrace.cycles-pp.__set_task_blocked.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.65 ± 2% -0.1 0.57 ± 2% perf-profile.calltrace.cycles-pp.recalc_sigpending.__set_task_blocked.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64
0.85 -0.1 0.79 ± 2% perf-profile.calltrace.cycles-pp.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.03 -0.0 0.98 perf-profile.calltrace.cycles-pp.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.81 -0.0 0.76 ± 2% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.fpu__clear.do_signal.exit_to_usermode_loop.do_syscall_64
0.98 -0.0 0.94 perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64
1.10 -0.0 1.07 perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.52 +0.0 0.55 ± 3% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.__fpu__restore_sig.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.87 +0.1 1.95 perf-profile.calltrace.cycles-pp.__fpu__restore_sig.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
23.79 +0.3 24.06 perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal.do_send_sig_info.do_send_specific.do_tkill
24.02 +0.3 24.29 perf-profile.calltrace.cycles-pp.__send_signal.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill
89.84 +0.3 90.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
89.46 +0.3 89.78 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.84 +0.4 37.20 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.43 +0.4 36.80 perf-profile.calltrace.cycles-pp.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.64 +0.4 26.09 perf-profile.calltrace.cycles-pp.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.58 +0.5 26.04 perf-profile.calltrace.cycles-pp.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.35 +0.5 25.82 perf-profile.calltrace.cycles-pp.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
24.66 +0.5 25.18 perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64
30.40 +0.6 30.97 perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64
31.58 +0.6 32.18 perf-profile.calltrace.cycles-pp.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.13 +0.8 29.91 perf-profile.calltrace.cycles-pp.__dequeue_signal.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop
28.90 +0.8 29.68 perf-profile.calltrace.cycles-pp.__sigqueue_free.__dequeue_signal.dequeue_signal.get_signal.do_signal
3.46 -0.2 3.21 ± 2% perf-profile.children.cycles-pp.recalc_sigpending
3.95 -0.2 3.79 perf-profile.children.cycles-pp.entry_SYSCALL_64
4.42 -0.2 4.26 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.93 -0.1 1.80 ± 2% perf-profile.children.cycles-pp.fpu__clear
3.62 -0.1 3.54 perf-profile.children.cycles-pp.__set_current_blocked
0.27 -0.1 0.21 ± 3% perf-profile.children.cycles-pp.fpregs_assert_state_consistent
0.84 -0.0 0.79 ± 2% perf-profile.children.cycles-pp._copy_from_user
1.03 -0.0 0.99 perf-profile.children.cycles-pp.signal_setup_done
0.34 -0.0 0.30 ± 5% perf-profile.children.cycles-pp.restore_altstack
0.73 -0.0 0.70 perf-profile.children.cycles-pp.__might_fault
1.11 -0.0 1.08 perf-profile.children.cycles-pp.copy_fpstate_to_sigframe
0.37 ± 2% -0.0 0.35 perf-profile.children.cycles-pp.___might_sleep
0.27 -0.0 0.26 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
1.89 +0.1 1.96 perf-profile.children.cycles-pp.__fpu__restore_sig
0.29 ± 7% +0.2 0.53 ± 6% perf-profile.children.cycles-pp.__lock_task_sighand
0.29 ± 7% +0.2 0.53 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
24.03 +0.3 24.29 perf-profile.children.cycles-pp.__send_signal
23.80 +0.3 24.06 perf-profile.children.cycles-pp.__sigqueue_alloc
90.00 +0.3 90.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
89.60 +0.3 89.92 perf-profile.children.cycles-pp.do_syscall_64
36.86 +0.4 37.22 perf-profile.children.cycles-pp.exit_to_usermode_loop
36.45 +0.4 36.81 perf-profile.children.cycles-pp.do_signal
25.65 +0.4 26.10 perf-profile.children.cycles-pp.__x64_sys_tgkill
25.59 +0.5 26.04 perf-profile.children.cycles-pp.do_tkill
25.36 +0.5 25.82 perf-profile.children.cycles-pp.do_send_specific
24.67 +0.5 25.19 perf-profile.children.cycles-pp.do_send_sig_info
30.41 +0.6 30.98 perf-profile.children.cycles-pp.dequeue_signal
31.60 +0.6 32.20 perf-profile.children.cycles-pp.get_signal
29.14 +0.8 29.92 perf-profile.children.cycles-pp.__dequeue_signal
28.90 +0.8 29.69 perf-profile.children.cycles-pp.__sigqueue_free
19.11 -0.4 18.75 perf-profile.self.cycles-pp.do_syscall_64
2.58 -0.2 2.34 perf-profile.self.cycles-pp.recalc_sigpending
3.95 -0.2 3.79 perf-profile.self.cycles-pp.entry_SYSCALL_64
4.41 -0.2 4.25 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.95 -0.1 0.86 ± 2% perf-profile.self.cycles-pp.fpu__clear
0.25 -0.1 0.19 ± 2% perf-profile.self.cycles-pp.fpregs_assert_state_consistent
0.15 ± 2% -0.0 0.12 ± 6% perf-profile.self.cycles-pp._copy_from_user
0.74 -0.0 0.71 perf-profile.self.cycles-pp.copy_fpstate_to_sigframe
0.34 -0.0 0.31 perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask
0.46 ± 2% -0.0 0.44 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.36 ± 3% -0.0 0.34 perf-profile.self.cycles-pp.___might_sleep
0.26 -0.0 0.24 ± 2% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
1.10 +0.0 1.15 perf-profile.self.cycles-pp.__fpu__restore_sig
0.28 ± 6% +0.2 0.53 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
23.71 +0.3 23.96 perf-profile.self.cycles-pp.__sigqueue_alloc
12.65 +0.6 13.24 perf-profile.self.cycles-pp.__sigqueue_free



will-it-scale.per_process_ops

52000 +-+-----------------------------------------------------------------+
|.. |
51000 +-++.+..+.+ |
50000 +-+ : |
| : |
49000 +-+ : |
| +..+. .+.+..+.+..+..+.+..+.+..+.. |
48000 +-+ +..+. +.+..+..+.+..+.+..|
| O O O O O O |
47000 +-+ O O O O O O |
46000 +-+ |
| |
45000 +-+ O O |
O O O O O O O O O |
44000 +-+-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (19.16 kB)
config-5.4.0-00001-gb77491648e6eb (203.97 kB)
job-script (7.88 kB)
job.yaml (5.50 kB)
reproduce (321.00 B)
Download all attachments

2020-02-05 20:48:16

by Andi Kleen

[permalink] [raw]
Subject: Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

kernel test robot <[email protected]> writes:

> Greeting,
>
> FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
> https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508

Seems to be spurious bisect. I don't think that commit could change
anything performance related.

-Andi

2020-02-06 03:21:39

by Philip Li

[permalink] [raw]
Subject: RE: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression

> Subject: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1%
> regression
>
> kernel test robot <[email protected]> writes:
>
> > Greeting,
> >
> > FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to
> commit:
> >
> >
> > commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure
> for exposing an Uncore unit to PMON mapping")
> > https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-
> x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
>
> Seems to be spurious bisect. I don't think that commit could change
> anything performance related.
Hi Andi, we will look into this as early as possible, we also receive another input from
Pater Z that he got false positive of will-it-scale.per_process_ops performance
regression. We will investigate them.

>
> -Andi
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-02-12 10:58:32

by Chen, Rong A

[permalink] [raw]
Subject: Re: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression



On 2/6/2020 4:47 AM, Andi Kleen wrote:
> kernel test robot <[email protected]> writes:
>
>> Greeting,
>>
>> FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to commit:
>>
>>
>> commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
>> https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
> Seems to be spurious bisect. I don't think that commit could change
> anything performance related.

Hi Andi,

I commented out some lines in arch/x86/events/intel/uncore.c and
will-it-scale.per_process_ops increased.

commit:
v5.4
b77491648e ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping")
f33fe1b258 ("test")


            v5.4  b77491648e6eb2f26b6edf5eae
f33fe1b258b2a4b2fc97600b2b  testcase/testparams/testbox
----------------  -------------------------- -------------------------- 
---------------------------
         %stddev      change         %stddev      change %stddev
             \          |                \          | \
     47983                       47004 47647
will-it-scale/performance-process-100%-signal1-ucode=0xb000038/lkp-bdw-ep6
     47983                       47004 47647        GEO-MEAN
will-it-scale.per_process_ops

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 55201bfde2c84c..0dc9c455423d99 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -887,7 +887,7 @@ static int uncore_pmu_register(struct
intel_uncore_pmu *pmu)
                pmu->pmu.attr_groups = pmu->type->attr_groups;
        }

-       pmu->pmu.attr_update = attr_update;
+       // pmu->pmu.attr_update = attr_update;

        if (pmu->type->num_boxes == 1) {
                if (strlen(pmu->type->name) > 0)
@@ -903,7 +903,7 @@ static int uncore_pmu_register(struct
intel_uncore_pmu *pmu)
         * Exposing mapping of Uncore units to corresponding Uncore PMUs
         * through /sys/devices/uncore_<type>_<idx>/mapping
         */
-       uncore_platform_mapping(pmu->type);
+       // uncore_platform_mapping(pmu->type);

        ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
        if (!ret)

Best Regards,
Rong Chen

>
> -Andi
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-02-12 15:16:19

by Liang, Kan

[permalink] [raw]
Subject: Re: [LKP] Re: [perf x86] b77491648e: will-it-scale.per_process_ops -2.1% regression



On 2/12/2020 5:56 AM, Chen, Rong A wrote:
>
>
> On 2/6/2020 4:47 AM, Andi Kleen wrote:
>> kernel test robot <[email protected]> writes:
>>
>>> Greeting,
>>>
>>> FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops
>>> due to commit:
>>>
>>>
>>> commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86:
>>> Infrastructure for exposing an Uncore unit to PMON mapping")
>>> https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508
>>>
>> Seems to be spurious bisect. I don't think that commit could change
>> anything performance related.
>
> Hi Andi,
>
> I commented out some lines in arch/x86/events/intel/uncore.c and
> will-it-scale.per_process_ops increased.
>
> commit:
>   v5.4
>   b77491648e ("perf x86: Infrastructure for exposing an Uncore unit to
> PMON mapping")
>   f33fe1b258 ("test")
>
>
>             v5.4  b77491648e6eb2f26b6edf5eae
> f33fe1b258b2a4b2fc97600b2b  testcase/testparams/testbox
> ----------------  -------------------------- --------------------------
> ---------------------------
>          %stddev      change         %stddev      change %stddev
>              \          |                \          | \
>      47983                       47004 47647
> will-it-scale/performance-process-100%-signal1-ucode=0xb000038/lkp-bdw-ep6
>      47983                       47004 47647        GEO-MEAN
> will-it-scale.per_process_ops
>
> diff --git a/arch/x86/events/intel/uncore.c
> b/arch/x86/events/intel/uncore.c
> index 55201bfde2c84c..0dc9c455423d99 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -887,7 +887,7 @@ static int uncore_pmu_register(struct
> intel_uncore_pmu *pmu)
>                 pmu->pmu.attr_groups = pmu->type->attr_groups;
>         }
>
> -       pmu->pmu.attr_update = attr_update;
> +       // pmu->pmu.attr_update = attr_update;
>
>         if (pmu->type->num_boxes == 1) {
>                 if (strlen(pmu->type->name) > 0)
> @@ -903,7 +903,7 @@ static int uncore_pmu_register(struct
> intel_uncore_pmu *pmu)
>          * Exposing mapping of Uncore units to corresponding Uncore PMUs
>          * through /sys/devices/uncore_<type>_<idx>/mapping
>          */
> -       uncore_platform_mapping(pmu->type);
> +       // uncore_platform_mapping(pmu->type);

The patch is for SKX uncore. The test machine looks like a BDX.
So the mapping_group should always be invisible.
The attr_update should not update.
I think there should be no performance impact.

static void uncore_platform_mapping(struct intel_uncore_type *t)
{
if (t->get_topology && t->set_mapping &&
!t->get_topology(t, max_dies) && !t->set_mapping(t, max_dies))
mapping_group.is_visible = NULL;
else
mapping_group.is_visible = not_visible;
}

Kan

>
>         ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
>         if (!ret)
>
> Best Regards,
> Rong Chen
>
>>
>> -Andi
>> _______________________________________________
>> LKP mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>