LinuxLists.cc - [bpf] af7ec13833: will-it-scale.per_process

2020-06-28 09:54:49

Subject: [bpf] af7ec13833: will-it-scale.per_process_ops -2.5% regression

Greeting,

FYI, we noticed a -2.5% regression of will-it-scale.per_process_ops due to commit:

commit: af7ec13833619e17f03aa73a785a2f871da6d66b ("bpf: Add bpf_skc_to_tcp6_sock() helper")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 16
mode: process
test: mmap1
cpufreq_governor: performance
ucode: 0x5002f01

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>

Details are as below:
-------------------------------------------------------------------------------------------------->

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-7.6/process/16/debian-x86_64-20191114.cgz/lkp-csl-2ap2/mmap1/will-it-scale/0x5002f01

commit:
72e2b2b66f ("bpf: Allow tracing programs to use bpf_jiffies64() helper")
af7ec13833 ("bpf: Add bpf_skc_to_tcp6_sock() helper")

72e2b2b66f9c1225 af7ec13833619e17f03aa73a785
---------------- ---------------------------
%stddev %change %stddev
\ | \
124208 -2.5% 121095 will-it-scale.per_process_ops
1987348 -2.5% 1937535 will-it-scale.workload
0.10 -0.0 0.08 mpstat.cpu.all.usr%
76456 ±133% +332.5% 330696 ± 65% numa-numastat.node3.local_node
107479 ± 95% +229.4% 354010 ± 58% numa-numastat.node3.numa_hit
7491 ± 4% -12.3% 6570 numa-vmstat.node1.nr_kernel_stack
1512 +52.2% 2303 ± 26% numa-vmstat.node1.nr_mapped
6256 ± 6% +155.0% 15951 ± 41% numa-vmstat.node3.nr_active_anon
1827 ± 13% +551.0% 11893 ± 57% numa-vmstat.node3.nr_anon_pages
6256 ± 6% +155.0% 15951 ± 41% numa-vmstat.node3.nr_zone_active_anon
32772 ± 15% -32.9% 21975 ± 7% sched_debug.cfs_rq:/.exec_clock.avg
53516 ± 13% -19.2% 43219 ± 9% sched_debug.cfs_rq:/.exec_clock.stddev
16.36 ± 6% -12.4% 14.33 ± 7% sched_debug.cfs_rq:/.load_avg.avg
565215 ± 15% -31.6% 386808 ± 8% sched_debug.cfs_rq:/.min_vruntime.avg
1795645 ± 29% -30.5% 1248436 ± 10% sched_debug.cpu.avg_idle.max
7492 ± 4% -12.3% 6567 numa-meminfo.node1.KernelStack
6048 +48.1% 8958 ± 29% numa-meminfo.node1.Mapped
810760 ± 3% -6.2% 760309 ± 6% numa-meminfo.node2.MemUsed
25061 ± 6% +158.8% 64865 ± 41% numa-meminfo.node3.Active
25053 ± 6% +154.7% 63821 ± 41% numa-meminfo.node3.Active(anon)
1.50 ±173% +1.7e+06% 25830 ± 96% numa-meminfo.node3.AnonHugePages
7307 ± 13% +550.9% 47561 ± 57% numa-meminfo.node3.AnonPages
3546 ±144% -87.2% 454.50 ± 10% interrupts.CPU0.CAL:Function_call_interrupts
813.25 +70.0% 1382 ± 6% interrupts.CPU1.CAL:Function_call_interrupts
71.00 ± 26% +43.0% 101.50 ± 5% interrupts.CPU112.NMI:Non-maskable_interrupts
71.00 ± 26% +43.0% 101.50 ± 5% interrupts.CPU112.PMI:Performance_monitoring_interrupts
799.00 +21.2% 968.75 ± 20% interrupts.CPU150.CAL:Function_call_interrupts
514.75 ± 31% +42.1% 731.50 ± 23% interrupts.CPU169.CAL:Function_call_interrupts
177.25 ± 10% -38.9% 108.25 ± 33% interrupts.CPU177.NMI:Non-maskable_interrupts
177.25 ± 10% -38.9% 108.25 ± 33% interrupts.CPU177.PMI:Performance_monitoring_interrupts
165.25 -22.8% 127.50 ± 19% interrupts.CPU81.NMI:Non-maskable_interrupts
165.25 -22.8% 127.50 ± 19% interrupts.CPU81.PMI:Performance_monitoring_interrupts
3.05 ± 6% +0.5 3.52 ± 11% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
1.15 ± 17% +0.6 1.80 ± 19% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
4.44 ± 5% +0.7 5.15 ± 3% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
4.90 ± 6% +0.7 5.63 ± 3% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
0.10 ± 5% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.__prepare_exit_to_usermode
0.35 ± 7% +0.1 0.40 ± 4% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.09 ± 64% +0.1 0.17 ± 31% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.29 ± 14% +0.1 0.39 ± 9% perf-profile.children.cycles-pp.clockevents_program_event
1.16 ± 17% +0.6 1.81 ± 19% perf-profile.children.cycles-pp.menu_select
4.91 ± 6% +0.7 5.64 ± 3% perf-profile.children.cycles-pp.unmap_vmas
4.79 ± 5% +0.7 5.52 ± 3% perf-profile.children.cycles-pp.unmap_page_range
2.01 ± 6% +0.5 2.50 ± 3% perf-profile.self.cycles-pp.unmap_page_range
0.49 ± 53% +0.5 1.02 ± 34% perf-profile.self.cycles-pp.cpuidle_enter_state
0.56 ± 42% +0.6 1.15 ± 35% perf-profile.self.cycles-pp.menu_select
5.756e+09 -2.0% 5.644e+09 perf-stat.i.branch-instructions
0.76 ± 23% -0.2 0.55 ± 32% perf-stat.i.branch-miss-rate%
43877584 ± 23% -29.8% 30789677 ± 32% perf-stat.i.branch-misses
7.017e+09 -1.8% 6.89e+09 perf-stat.i.dTLB-loads
2.747e+09 -2.3% 2.684e+09 perf-stat.i.dTLB-stores
2.46e+10 -1.9% 2.412e+10 perf-stat.i.instructions
82.00 -2.2% 80.20 perf-stat.i.metric.M/sec
0.76 ± 23% -0.2 0.55 ± 32% perf-stat.overall.branch-miss-rate%
5.737e+09 -2.0% 5.625e+09 perf-stat.ps.branch-instructions
43726188 ± 23% -29.8% 30693805 ± 32% perf-stat.ps.branch-misses
6.994e+09 -1.8% 6.867e+09 perf-stat.ps.dTLB-loads
2.738e+09 -2.3% 2.675e+09 perf-stat.ps.dTLB-stores
2.451e+10 -1.9% 2.404e+10 perf-stat.ps.instructions
7.409e+12 -2.0% 7.263e+12 perf-stat.total.instructions
161023 -15.8% 135597 ± 14% softirqs.CPU0.TIMER
156141 ± 2% -17.9% 128125 ± 15% softirqs.CPU10.TIMER
157518 -14.0% 135498 ± 11% softirqs.CPU100.TIMER
154277 ± 2% -17.7% 126919 ± 15% softirqs.CPU101.TIMER
157440 -14.3% 134970 ± 10% softirqs.CPU102.TIMER
154378 ± 2% -16.3% 129169 ± 16% softirqs.CPU103.TIMER
41427 -43.4% 23450 ± 54% softirqs.CPU104.SCHED
154175 ± 2% -17.7% 126885 ± 15% softirqs.CPU105.TIMER
157484 -14.5% 134616 ± 11% softirqs.CPU106.TIMER
155021 -18.1% 126948 ± 15% softirqs.CPU107.TIMER
41420 -55.7% 18328 ± 80% softirqs.CPU108.SCHED
157475 -15.2% 133509 ± 11% softirqs.CPU108.TIMER
154380 ± 2% -17.8% 126972 ± 15% softirqs.CPU109.TIMER
158325 ± 3% -15.1% 134489 ± 11% softirqs.CPU11.TIMER
157174 -14.2% 134838 ± 11% softirqs.CPU110.TIMER
154171 ± 2% -17.6% 126978 ± 15% softirqs.CPU111.TIMER
157383 -16.5% 131351 ± 15% softirqs.CPU112.TIMER
157651 -17.0% 130902 ± 15% softirqs.CPU113.TIMER
40573 ± 2% -32.7% 27323 ± 25% softirqs.CPU114.SCHED
157205 -16.9% 130703 ± 15% softirqs.CPU114.TIMER
40295 ± 2% -29.9% 28240 ± 33% softirqs.CPU115.SCHED
157494 -16.8% 131103 ± 17% softirqs.CPU115.TIMER
158789 -17.6% 130918 ± 15% softirqs.CPU116.TIMER
157181 -16.2% 131773 ± 15% softirqs.CPU117.TIMER
40307 ± 2% -29.9% 28267 ± 15% softirqs.CPU118.SCHED
157772 -17.1% 130860 ± 16% softirqs.CPU118.TIMER
40343 ± 2% -25.6% 30028 ± 18% softirqs.CPU119.SCHED
157242 -16.7% 130914 ± 16% softirqs.CPU119.TIMER
156122 ± 3% -17.3% 129101 ± 16% softirqs.CPU12.TIMER
38457 ± 4% -9.0% 34980 ± 9% softirqs.CPU122.SCHED
39836 ± 2% -12.2% 34986 ± 5% softirqs.CPU123.SCHED
161083 ± 2% -15.2% 136601 ± 11% softirqs.CPU13.TIMER
155511 ± 2% -17.9% 127705 ± 15% softirqs.CPU14.TIMER
94832 ± 6% +32.1% 125303 ± 16% softirqs.CPU144.TIMER
91430 ± 7% +35.6% 123962 ± 16% softirqs.CPU145.TIMER
159196 -14.1% 136742 ± 11% softirqs.CPU15.TIMER
101913 ± 4% +21.8% 124180 ± 16% softirqs.CPU150.TIMER
92878 ± 6% +32.1% 122715 ± 17% softirqs.CPU151.TIMER
97986 ± 4% +26.2% 123633 ± 16% softirqs.CPU153.TIMER
95855 ± 5% +29.5% 124173 ± 16% softirqs.CPU155.TIMER
92790 ± 6% +32.7% 123146 ± 17% softirqs.CPU156.TIMER
97758 ± 6% +26.2% 123397 ± 17% softirqs.CPU157.TIMER
93450 ± 6% +32.0% 123341 ± 17% softirqs.CPU159.TIMER
92632 ± 6% +34.7% 124781 ± 17% softirqs.CPU162.TIMER
102121 ± 7% +21.0% 123517 ± 16% softirqs.CPU163.TIMER
104450 ± 4% +19.0% 124312 ± 16% softirqs.CPU165.TIMER
97581 ± 8% +26.3% 123253 ± 17% softirqs.CPU167.TIMER
155698 ± 2% -17.4% 128615 ± 16% softirqs.CPU2.TIMER
39566 ± 6% -16.0% 33236 ± 7% softirqs.CPU21.SCHED
40546 ± 2% -12.4% 35536 ± 8% softirqs.CPU27.SCHED
159665 -15.4% 135050 ± 11% softirqs.CPU3.TIMER
39524 ± 3% -16.8% 32901 ± 10% softirqs.CPU31.SCHED
3836 +221.3% 12323 ±115% softirqs.CPU4.SCHED
155133 ± 2% -17.5% 127978 ± 16% softirqs.CPU4.TIMER
95925 ± 6% +25.0% 119877 ± 19% softirqs.CPU48.TIMER
158122 -14.5% 135135 ± 10% softirqs.CPU5.TIMER
96117 ± 7% +24.8% 119976 ± 19% softirqs.CPU50.TIMER
94315 ± 7% +27.2% 120006 ± 19% softirqs.CPU55.TIMER
96872 ± 6% +23.4% 119544 ± 19% softirqs.CPU59.TIMER
93665 ± 7% +27.4% 119361 ± 20% softirqs.CPU60.TIMER
94927 ± 7% +26.4% 120029 ± 19% softirqs.CPU63.TIMER
94113 ± 7% +27.0% 119562 ± 19% softirqs.CPU66.TIMER
154828 ± 2% -17.4% 127819 ± 15% softirqs.CPU8.TIMER
159303 -14.6% 136018 ± 11% softirqs.CPU9.TIMER
155029 -15.3% 131284 ± 10% softirqs.CPU96.TIMER
154115 ± 2% -17.7% 126912 ± 15% softirqs.CPU97.TIMER
41358 -44.5% 22973 ± 57% softirqs.CPU98.SCHED
157639 -14.5% 134799 ± 11% softirqs.CPU98.TIMER
154157 ± 2% -17.7% 126852 ± 15% softirqs.CPU99.TIMER

will-it-scale.per_process_ops

132000 +------------------------------------------------------------------+
| |
130000 |.++.+.++.+.++.+.+ +.+.+.+ |
| : : : |
128000 |-+ : : : |
| : : : |
126000 |-+ : : : |
| : : : |
124000 |-+ +.+.++.+.+ +.+.++.+.++.+ |
| |
122000 |-+ O O O O OO O OO O OO O OO O |
| O O OO O OO O OO |
120000 |-+ O OO O OO O O O |
| |
118000 +------------------------------------------------------------------+

[*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

Thanks,
Rong Chen

Attachments:

(No filename) (13.72 kB)
config-5.8.0-rc1-00160-gaf7ec13833619 (209.49 kB)
job-script (7.50 kB)
job.yaml (5.09 kB)
reproduce (347.00 B)
Download all attachments

2020-06-29 19:08:04

by Yonghong Song

[permalink] [raw]

Subject: Re: [bpf] af7ec13833: will-it-scale.per_process_ops -2.5% regression

On 6/28/20 1:50 AM, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -2.5% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: af7ec13833619e17f03aa73a785a2f871da6d66b ("bpf: Add bpf_skc_to_tcp6_sock() helper")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

One of previous emails claims that
commit: 492e639f0c222784e2e0f121966375f641c61b15 ("bpf: Add
bpf_seq_printf and bpf_seq_write helpers")
is reponsible for 2.5% improvement for will-it-scale.per_process_ops,
which I believe is false.

This commit should not cause regression.

Probably the variation of performance is caused by test environment
which you may want to investigate further to reduce false alarming.
Thanks!

>
> in testcase: will-it-scale
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> nr_task: 16
> mode: process
> test: mmap1
> cpufreq_governor: performance
> ucode: 0x5002f01
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
[...]

2020-07-03 05:57:54

by Chen, Rong A

[permalink] [raw]

Subject: Re: [bpf] af7ec13833: will-it-scale.per_process_ops -2.5% regression

On 6/29/20 11:10 PM, Yonghong Song wrote:
>
>
> On 6/28/20 1:50 AM, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -2.5% regression of will-it-scale.per_process_ops
>> due to commit:
>>
>>
>> commit: af7ec13833619e17f03aa73a785a2f871da6d66b ("bpf: Add
>> bpf_skc_to_tcp6_sock() helper")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> One of previous emails claims that
> ??? commit: 492e639f0c222784e2e0f121966375f641c61b15 ("bpf: Add
> bpf_seq_printf and bpf_seq_write helpers")
> is reponsible for 2.5% improvement for will-it-scale.per_process_ops,
> which I believe is false.
>
> This commit should not cause regression.
>
> Probably the variation of performance is caused by test environment
> which you may want to investigate further to reduce false alarming.
> Thanks!

Hi Yonghong,

It's a function align issue, the commit effects the align of functions
which causes a little regression,
we force to set -falign-functions=32 in KBUILD_CFLAGS and the regression
is gone:

diff --git a/Makefile b/Makefile
index 70def4907036c..9746afa4edc21 100644
--- a/Makefile
+++ b/Makefile
@@ -476,7 +476,7 @@ LINUXINCLUDE??? := \
??????????????? $(USERINCLUDE)

?KBUILD_AFLAGS?? := -D__ASSEMBLY__ -fno-PIE
-KBUILD_CFLAGS?? := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
+KBUILD_CFLAGS?? := -Wall -Wundef -falign-functions=32
-Werror=strict-prototypes -Wno-trigraphs \
?????????????????? -fno-strict-aliasing -fno-common -fshort-wchar
-fno-PIE \
?????????????????? -Werror=implicit-function-declaration
-Werror=implicit-int \
?????????????????? -Wno-format-security \

Best Regards,
Rong Chen

>
>>
>> in testcase: will-it-scale
>> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @
>> 2.30GHz with 192G memory
>> with following parameters:
>>
>> ????nr_task: 16
>> ????mode: process
>> ????test: mmap1
>> ????cpufreq_governor: performance
>> ????ucode: 0x5002f01
>>
>> test-description: Will It Scale takes a testcase and runs it from 1
>> through to n parallel copies to see if the testcase will scale. It
>> builds both a process and threads based test in order to see any
>> differences between the two.
>> test-url: https://github.com/antonblanchard/will-it-scale
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
>>
>>
>> Details are as below:
> [...]

2020-07-13 09:55:28

by Feng Tang

[permalink] [raw]

Subject: Re: [LKP] Re: [bpf] af7ec13833: will-it-scale.per_process_ops -2.5% regression

On Fri, Jul 03, 2020 at 01:54:39PM +0800, Rong Chen wrote:
> > This commit should not cause regression.
> >
> > Probably the variation of performance is caused by test environment
> > which you may want to investigate further to reduce false alarming.
> > Thanks!
>
> Hi Yonghong,
>
> It's a function align issue, the commit effects the align of functions
> which causes a little regression,
> we force to set -falign-functions=32 in KBUILD_CFLAGS and the regression
> is gone:
>
> diff --git a/Makefile b/Makefile
> index 70def4907036c..9746afa4edc21 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -476,7 +476,7 @@ LINUXINCLUDE牋? := \
> 牋牋牋牋牋牋牋? $(USERINCLUDE)
>
> 燢BUILD_AFLAGS牋 := -D__ASSEMBLY__ -fno-PIE
> -KBUILD_CFLAGS牋 := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
> +KBUILD_CFLAGS牋 := -Wall -Wundef -falign-functions=32
> -Werror=strict-prototypes -Wno-trigraphs \
> 牋牋牋牋牋牋牋牋牋 -fno-strict-aliasing -fno-common -fshort-wchar
> -fno-PIE \
> 牋牋牋牋牋牋牋牋牋 -Werror=implicit-function-declaration
> -Werror=implicit-int \
> 牋牋牋牋牋牋牋牋牋 -Wno-format-security \

For these strange performance change cases caused by a seemingly
unrelated commit, we have used this function alignment patch to
explain some of them to be caused by re-arrange of text code
alignment [1][2]

So one bold thought is can we merge this option into mainline under
a kernel config option in 'kernel hacking' category, with which both
developers and 0day can filter out some cases to be related with text
alignment change more easily?

[1] [LKP][mm] fd4d9c7d0c: stress-ng.switch.ops_per_sec -30.5% regression
https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
[2] [mm/hugetlb] c77c0a8ac4: will-it-scale.per_process_ops 15.9% improvement
https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
(this patch is not used in the discussion, but we later used
this patch to confirm it's a text alignment case)

Thanks,
Feng