Greeting,
FYI, we noticed a -5.1% regression of unixbench.score due to commit:
commit: f1a7941243c102a44e8847e3b94ff4ff3ec56f25 ("mm: convert mm's rss stats into percpu_counter")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: unixbench
on test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
with following parameters:
runtime: 300s
nr_task: 1
test: shell8
cpufreq_governor: performance
test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench
Details are as below:
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/300s/lkp-icl-2sp2/shell8/unixbench
commit:
9cd6ffa602 ("selftests/damon: add tests for DAMON_LRU_SORT's enabled parameter")
f1a7941243 ("mm: convert mm's rss stats into percpu_counter")
9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
---------------- ---------------------------
%stddev %change %stddev
\ | \
7917 -5.1% 7509 unixbench.score
10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
37236706 -5.1% 35324104 unixbench.time.minor_page_faults
567.67 -1.3% 560.50 unixbench.time.percent_of_cpu_this_job_got
215.51 +3.7% 223.41 unixbench.time.system_time
143.81 -8.0% 132.23 unixbench.time.user_time
1075861 -4.0% 1033025 unixbench.time.voluntary_context_switches
299268 -4.9% 284627 unixbench.workload
13368320 ? 7% +18.1% 15788032 ? 6% meminfo.DirectMap2M
44104 -3.2% 42701 vmstat.system.cs
0.98 ? 20% +0.7 1.64 ? 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
2.12 ? 19% +0.8 2.96 ? 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.35 ? 13% +0.9 3.28 ? 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.14 ? 74% +0.4 0.55 ? 32% perf-profile.children.cycles-pp.do_task_dead
0.04 ?223% +0.4 0.47 ? 49% perf-profile.children.cycles-pp.__mmdrop
25909747 -5.1% 24594282 proc-vmstat.numa_hit
25735224 -5.1% 24419554 proc-vmstat.numa_local
46218815 -5.1% 43858849 proc-vmstat.pgalloc_normal
37577543 -5.1% 35663315 proc-vmstat.pgfault
46035593 -5.1% 43669821 proc-vmstat.pgfree
1592335 -3.9% 1529709 proc-vmstat.pgreuse
38041 -5.1% 36089 proc-vmstat.thp_fault_alloc
532220 -5.1% 504893 proc-vmstat.unevictable_pgs_culled
11.43 +7.9% 12.34 perf-stat.i.MPKI
4.415e+09 -3.2% 4.272e+09 perf-stat.i.branch-instructions
1.71 -0.0 1.67 perf-stat.i.branch-miss-rate%
74672466 -5.3% 70687323 perf-stat.i.branch-misses
14.90 +1.3 16.18 perf-stat.i.cache-miss-rate%
38201955 +14.5% 43725695 perf-stat.i.cache-misses
2.517e+08 +5.0% 2.644e+08 perf-stat.i.cache-references
46317 -3.8% 44553 perf-stat.i.context-switches
0.94 +1.7% 0.96 perf-stat.i.cpi
2.032e+10 -1.5% 2.002e+10 perf-stat.i.cpu-cycles
2760 -6.3% 2585 perf-stat.i.cpu-migrations
0.05 +0.0 0.07 ? 34% perf-stat.i.dTLB-load-miss-rate%
2687620 +37.3% 3690868 ? 35% perf-stat.i.dTLB-load-misses
5.592e+09 -3.0% 5.422e+09 perf-stat.i.dTLB-loads
0.09 +0.0 0.09 ? 8% perf-stat.i.dTLB-store-miss-rate%
3.295e+09 -3.6% 3.175e+09 perf-stat.i.dTLB-stores
2.162e+10 -2.9% 2.1e+10 perf-stat.i.instructions
1.06 -1.5% 1.05 perf-stat.i.ipc
0.16 -1.5% 0.16 perf-stat.i.metric.GHz
188.29 +21.3% 228.40 perf-stat.i.metric.K/sec
105.85 -3.1% 102.57 perf-stat.i.metric.M/sec
580192 -5.4% 548596 perf-stat.i.minor-faults
7332887 +14.4% 8388008 perf-stat.i.node-load-misses
197858 +2.4% 202553 perf-stat.i.node-loads
77.77 +4.8 82.56 perf-stat.i.node-store-miss-rate%
8653164 +45.4% 12581116 perf-stat.i.node-store-misses
2454964 +5.3% 2585143 perf-stat.i.node-stores
580247 -5.4% 548650 perf-stat.i.page-faults
11.64 +8.1% 12.59 perf-stat.overall.MPKI
1.69 -0.0 1.65 perf-stat.overall.branch-miss-rate%
15.18 +1.4 16.54 perf-stat.overall.cache-miss-rate%
0.94 +1.4% 0.95 perf-stat.overall.cpi
531.91 -13.9% 457.79 perf-stat.overall.cycles-between-cache-misses
0.05 +0.0 0.07 ? 34% perf-stat.overall.dTLB-load-miss-rate%
0.09 +0.0 0.10 ? 8% perf-stat.overall.dTLB-store-miss-rate%
1.06 -1.4% 1.05 perf-stat.overall.ipc
77.90 +5.1 82.95 perf-stat.overall.node-store-miss-rate%
4574083 +2.5% 4688385 perf-stat.overall.path-length
4.344e+09 -3.2% 4.203e+09 perf-stat.ps.branch-instructions
73479648 -5.3% 69558968 perf-stat.ps.branch-misses
37586020 +14.5% 43024951 perf-stat.ps.cache-misses
2.477e+08 +5.0% 2.601e+08 perf-stat.ps.cache-references
45573 -3.8% 43840 perf-stat.ps.context-switches
1.999e+10 -1.5% 1.97e+10 perf-stat.ps.cpu-cycles
2715 -6.3% 2543 perf-stat.ps.cpu-migrations
2644325 +37.3% 3631630 ? 35% perf-stat.ps.dTLB-load-misses
5.502e+09 -3.0% 5.336e+09 perf-stat.ps.dTLB-loads
3.242e+09 -3.6% 3.124e+09 perf-stat.ps.dTLB-stores
2.127e+10 -2.9% 2.066e+10 perf-stat.ps.instructions
570836 -5.4% 539805 perf-stat.ps.minor-faults
7214650 +14.4% 8253560 perf-stat.ps.node-load-misses
194668 +2.4% 199305 perf-stat.ps.node-loads
8513681 +45.4% 12379540 perf-stat.ps.node-store-misses
2415376 +5.3% 2543709 perf-stat.ps.node-stores
570889 -5.4% 539858 perf-stat.ps.page-faults
1.369e+12 -2.5% 1.334e+12 perf-stat.total.instructions
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
On Sun, Dec 4, 2022 at 9:56 PM kernel test robot <[email protected]> wrote:
>
> Greeting,
>
> FYI, we noticed a -5.1% regression of unixbench.score due to commit:
>
[...]
> 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 7917 -5.1% 7509 unixbench.score
What is unixbench.score?
> 10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
> 37236706 -5.1% 35324104 unixbench.time.minor_page_faults
For above two, is negative change good or bad?
> 0.98 ą 20% +0.7 1.64 ą 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> 2.12 ą 19% +0.8 2.96 ą 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 2.35 ą 13% +0.9 3.28 ą 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 0.14 ą 74% +0.4 0.55 ą 32% perf-profile.children.cycles-pp.do_task_dead
> 0.04 ą223% +0.4 0.47 ą 49% perf-profile.children.cycles-pp.__mmdrop
Also how should I interpret the above perf-profiles?
Shakeel Butt <[email protected]> writes:
> On Sun, Dec 4, 2022 at 9:56 PM kernel test robot <[email protected]> wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -5.1% regression of unixbench.score due to commit:
>>
> [...]
>> 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 7917 -5.1% 7509 unixbench.score
>
> What is unixbench.score?
Should be benchmark throughput.
>> 10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
This should reflect accuracy change of per_cpu_counter.
>> 37236706 -5.1% 35324104 unixbench.time.minor_page_faults
The reduction is same as benchmark score. So I think this reflect the
nature of time-bound testing (instead of workload-bound).
> For above two, is negative change good or bad?
>
>> 0.98 ą 20% +0.7 1.64 ą 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
>> 2.12 ą 19% +0.8 2.96 ą 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>> 2.35 ą 13% +0.9 3.28 ą 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>> 0.14 ą 74% +0.4 0.55 ą 32% perf-profile.children.cycles-pp.do_task_dead
>> 0.04 ą223% +0.4 0.47 ą 49% perf-profile.children.cycles-pp.__mmdrop
>
> Also how should I interpret the above perf-profiles?
It appears that the changes of handle_mm_fault() and __mmdrop() are
related to the code of the commit? That is, for this specific workloads
(not so unpractical), the operations become slower?
Best Regards,
Huang, Ying
Hi, Shakeel Butt,
please be noted we noticed the regression is still existing after this commit
merged to mainline, we reported as
https://lore.kernel.org/all/[email protected]/
On Tue, Dec 06, 2022 at 10:41:00AM +0800, Huang, Ying wrote:
> Shakeel Butt <[email protected]> writes:
>
> > On Sun, Dec 4, 2022 at 9:56 PM kernel test robot <[email protected]> wrote:
> >>
> >> Greeting,
> >>
> >> FYI, we noticed a -5.1% regression of unixbench.score due to commit:
> >>
> > [...]
> >> 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
> >> ---------------- ---------------------------
> >> %stddev %change %stddev
> >> \ | \
> >> 7917 -5.1% 7509 unixbench.score
> >
> > What is unixbench.score?
>
> Should be benchmark throughput.
>
> >> 10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
>
> This should reflect accuracy change of per_cpu_counter.
>
> >> 37236706 -5.1% 35324104 unixbench.time.minor_page_faults
>
> The reduction is same as benchmark score. So I think this reflect the
> nature of time-bound testing (instead of workload-bound).
>
> > For above two, is negative change good or bad?
> >
> >> 0.98 ą 20% +0.7 1.64 ą 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> >> 2.12 ą 19% +0.8 2.96 ą 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> >> 2.35 ą 13% +0.9 3.28 ą 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> >> 0.14 ą 74% +0.4 0.55 ą 32% perf-profile.children.cycles-pp.do_task_dead
> >> 0.04 ą223% +0.4 0.47 ą 49% perf-profile.children.cycles-pp.__mmdrop
> >
> > Also how should I interpret the above perf-profiles?
>
> It appears that the changes of handle_mm_fault() and __mmdrop() are
> related to the code of the commit? That is, for this specific workloads
> (not so unpractical), the operations become slower?
>
> Best Regards,
> Huang, Ying
>
On Sun, Jan 29, 2023 at 6:37 PM Oliver Sang <[email protected]> wrote:
>
> Hi, Shakeel Butt,
>
> please be noted we noticed the regression is still existing after this commit
> merged to mainline, we reported as
> https://lore.kernel.org/all/[email protected]/
>
Let's discuss this on the other email chain.