LinuxLists.cc - [linux-next:master] [mm] f1a7941243: unixbench.score -5.1% regression

2022-12-05 06:42:00

Subject: [linux-next:master] [mm] f1a7941243: unixbench.score -5.1% regression

Greeting,

FYI, we noticed a -5.1% regression of unixbench.score due to commit:

commit: f1a7941243c102a44e8847e3b94ff4ff3ec56f25 ("mm: convert mm's rss stats into percpu_counter")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: unixbench
on test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
with following parameters:

runtime: 300s
nr_task: 1
test: shell8
cpufreq_governor: performance

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench

Details are as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/300s/lkp-icl-2sp2/shell8/unixbench

commit:
9cd6ffa602 ("selftests/damon: add tests for DAMON_LRU_SORT's enabled parameter")
f1a7941243 ("mm: convert mm's rss stats into percpu_counter")

9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
---------------- ---------------------------
%stddev %change %stddev
\ | \
7917 -5.1% 7509 unixbench.score
10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
37236706 -5.1% 35324104 unixbench.time.minor_page_faults
567.67 -1.3% 560.50 unixbench.time.percent_of_cpu_this_job_got
215.51 +3.7% 223.41 unixbench.time.system_time
143.81 -8.0% 132.23 unixbench.time.user_time
1075861 -4.0% 1033025 unixbench.time.voluntary_context_switches
299268 -4.9% 284627 unixbench.workload
13368320 ? 7% +18.1% 15788032 ? 6% meminfo.DirectMap2M
44104 -3.2% 42701 vmstat.system.cs
0.98 ? 20% +0.7 1.64 ? 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
2.12 ? 19% +0.8 2.96 ? 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.35 ? 13% +0.9 3.28 ? 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.14 ? 74% +0.4 0.55 ? 32% perf-profile.children.cycles-pp.do_task_dead
0.04 ?223% +0.4 0.47 ? 49% perf-profile.children.cycles-pp.__mmdrop
25909747 -5.1% 24594282 proc-vmstat.numa_hit
25735224 -5.1% 24419554 proc-vmstat.numa_local
46218815 -5.1% 43858849 proc-vmstat.pgalloc_normal
37577543 -5.1% 35663315 proc-vmstat.pgfault
46035593 -5.1% 43669821 proc-vmstat.pgfree
1592335 -3.9% 1529709 proc-vmstat.pgreuse
38041 -5.1% 36089 proc-vmstat.thp_fault_alloc
532220 -5.1% 504893 proc-vmstat.unevictable_pgs_culled
11.43 +7.9% 12.34 perf-stat.i.MPKI
4.415e+09 -3.2% 4.272e+09 perf-stat.i.branch-instructions
1.71 -0.0 1.67 perf-stat.i.branch-miss-rate%
74672466 -5.3% 70687323 perf-stat.i.branch-misses
14.90 +1.3 16.18 perf-stat.i.cache-miss-rate%
38201955 +14.5% 43725695 perf-stat.i.cache-misses
2.517e+08 +5.0% 2.644e+08 perf-stat.i.cache-references
46317 -3.8% 44553 perf-stat.i.context-switches
0.94 +1.7% 0.96 perf-stat.i.cpi
2.032e+10 -1.5% 2.002e+10 perf-stat.i.cpu-cycles
2760 -6.3% 2585 perf-stat.i.cpu-migrations
0.05 +0.0 0.07 ? 34% perf-stat.i.dTLB-load-miss-rate%
2687620 +37.3% 3690868 ? 35% perf-stat.i.dTLB-load-misses
5.592e+09 -3.0% 5.422e+09 perf-stat.i.dTLB-loads
0.09 +0.0 0.09 ? 8% perf-stat.i.dTLB-store-miss-rate%
3.295e+09 -3.6% 3.175e+09 perf-stat.i.dTLB-stores
2.162e+10 -2.9% 2.1e+10 perf-stat.i.instructions
1.06 -1.5% 1.05 perf-stat.i.ipc
0.16 -1.5% 0.16 perf-stat.i.metric.GHz
188.29 +21.3% 228.40 perf-stat.i.metric.K/sec
105.85 -3.1% 102.57 perf-stat.i.metric.M/sec
580192 -5.4% 548596 perf-stat.i.minor-faults
7332887 +14.4% 8388008 perf-stat.i.node-load-misses
197858 +2.4% 202553 perf-stat.i.node-loads
77.77 +4.8 82.56 perf-stat.i.node-store-miss-rate%
8653164 +45.4% 12581116 perf-stat.i.node-store-misses
2454964 +5.3% 2585143 perf-stat.i.node-stores
580247 -5.4% 548650 perf-stat.i.page-faults
11.64 +8.1% 12.59 perf-stat.overall.MPKI
1.69 -0.0 1.65 perf-stat.overall.branch-miss-rate%
15.18 +1.4 16.54 perf-stat.overall.cache-miss-rate%
0.94 +1.4% 0.95 perf-stat.overall.cpi
531.91 -13.9% 457.79 perf-stat.overall.cycles-between-cache-misses
0.05 +0.0 0.07 ? 34% perf-stat.overall.dTLB-load-miss-rate%
0.09 +0.0 0.10 ? 8% perf-stat.overall.dTLB-store-miss-rate%
1.06 -1.4% 1.05 perf-stat.overall.ipc
77.90 +5.1 82.95 perf-stat.overall.node-store-miss-rate%
4574083 +2.5% 4688385 perf-stat.overall.path-length
4.344e+09 -3.2% 4.203e+09 perf-stat.ps.branch-instructions
73479648 -5.3% 69558968 perf-stat.ps.branch-misses
37586020 +14.5% 43024951 perf-stat.ps.cache-misses
2.477e+08 +5.0% 2.601e+08 perf-stat.ps.cache-references
45573 -3.8% 43840 perf-stat.ps.context-switches
1.999e+10 -1.5% 1.97e+10 perf-stat.ps.cpu-cycles
2715 -6.3% 2543 perf-stat.ps.cpu-migrations
2644325 +37.3% 3631630 ? 35% perf-stat.ps.dTLB-load-misses
5.502e+09 -3.0% 5.336e+09 perf-stat.ps.dTLB-loads
3.242e+09 -3.6% 3.124e+09 perf-stat.ps.dTLB-stores
2.127e+10 -2.9% 2.066e+10 perf-stat.ps.instructions
570836 -5.4% 539805 perf-stat.ps.minor-faults
7214650 +14.4% 8253560 perf-stat.ps.node-load-misses
194668 +2.4% 199305 perf-stat.ps.node-loads
8513681 +45.4% 12379540 perf-stat.ps.node-store-misses
2415376 +5.3% 2543709 perf-stat.ps.node-stores
570889 -5.4% 539858 perf-stat.ps.page-faults
1.369e+12 -2.5% 1.334e+12 perf-stat.total.instructions

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]

To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.

--
0-DAY CI Kernel Test Service
https://01.org/lkp

Attachments:

(No filename) (8.23 kB)
config-6.1.0-rc4-00152-gf1a7941243c1 (168.53 kB)
job-script (7.95 kB)
job.yaml (5.45 kB)
reproduce (287.00 B)
Download all attachments

2022-12-05 18:10:33

by Shakeel Butt

[permalink] [raw]

Subject: Re: [linux-next:master] [mm] f1a7941243: unixbench.score -5.1% regression

On Sun, Dec 4, 2022 at 9:56 PM kernel test robot <[email protected]> wrote:
>
> Greeting,
>
> FYI, we noticed a -5.1% regression of unixbench.score due to commit:
>
[...]
> 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 7917 -5.1% 7509 unixbench.score

What is unixbench.score?

> 10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
> 37236706 -5.1% 35324104 unixbench.time.minor_page_faults

For above two, is negative change good or bad?

> 0.98 ą 20% +0.7 1.64 ą 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> 2.12 ą 19% +0.8 2.96 ą 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 2.35 ą 13% +0.9 3.28 ą 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 0.14 ą 74% +0.4 0.55 ą 32% perf-profile.children.cycles-pp.do_task_dead
> 0.04 ą223% +0.4 0.47 ą 49% perf-profile.children.cycles-pp.__mmdrop

Also how should I interpret the above perf-profiles?

2022-12-06 02:53:45

by Huang, Ying

[permalink] [raw]

Subject: Re: [linux-next:master] [mm] f1a7941243: unixbench.score -5.1% regression

Shakeel Butt <[email protected]> writes:

> On Sun, Dec 4, 2022 at 9:56 PM kernel test robot <[email protected]> wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -5.1% regression of unixbench.score due to commit:
>>
> [...]
>> 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 7917 -5.1% 7509 unixbench.score
>
> What is unixbench.score?

Should be benchmark throughput.

>> 10485 -12.1% 9216 unixbench.time.maximum_resident_set_size

This should reflect accuracy change of per_cpu_counter.

>> 37236706 -5.1% 35324104 unixbench.time.minor_page_faults

The reduction is same as benchmark score. So I think this reflect the
nature of time-bound testing (instead of workload-bound).

> For above two, is negative change good or bad?
>
>> 0.98 ą 20% +0.7 1.64 ą 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
>> 2.12 ą 19% +0.8 2.96 ą 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>> 2.35 ą 13% +0.9 3.28 ą 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>> 0.14 ą 74% +0.4 0.55 ą 32% perf-profile.children.cycles-pp.do_task_dead
>> 0.04 ą223% +0.4 0.47 ą 49% perf-profile.children.cycles-pp.__mmdrop
>
> Also how should I interpret the above perf-profiles?

It appears that the changes of handle_mm_fault() and __mmdrop() are
related to the code of the commit? That is, for this specific workloads
(not so unpractical), the operations become slower?

Best Regards,
Huang, Ying

2023-01-30 02:37:29

by kernel test robot

[permalink] [raw]

Subject: Re: [linux-next:master] [mm] f1a7941243: unixbench.score -5.1% regression

Hi, Shakeel Butt,

please be noted we noticed the regression is still existing after this commit
merged to mainline, we reported as
https://lore.kernel.org/all/[email protected]/

On Tue, Dec 06, 2022 at 10:41:00AM +0800, Huang, Ying wrote:
> Shakeel Butt <[email protected]> writes:
>
> > On Sun, Dec 4, 2022 at 9:56 PM kernel test robot <[email protected]> wrote:
> >>
> >> Greeting,
> >>
> >> FYI, we noticed a -5.1% regression of unixbench.score due to commit:
> >>
> > [...]
> >> 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
> >> ---------------- ---------------------------
> >> %stddev %change %stddev
> >> \ | \
> >> 7917 -5.1% 7509 unixbench.score
> >
> > What is unixbench.score?
>
> Should be benchmark throughput.
>
> >> 10485 -12.1% 9216 unixbench.time.maximum_resident_set_size
>
> This should reflect accuracy change of per_cpu_counter.
>
> >> 37236706 -5.1% 35324104 unixbench.time.minor_page_faults
>
> The reduction is same as benchmark score. So I think this reflect the
> nature of time-bound testing (instead of workload-bound).
>
> > For above two, is negative change good or bad?
> >
> >> 0.98 ą 20% +0.7 1.64 ą 38% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> >> 2.12 ą 19% +0.8 2.96 ą 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> >> 2.35 ą 13% +0.9 3.28 ą 13% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> >> 0.14 ą 74% +0.4 0.55 ą 32% perf-profile.children.cycles-pp.do_task_dead
> >> 0.04 ą223% +0.4 0.47 ą 49% perf-profile.children.cycles-pp.__mmdrop
> >
> > Also how should I interpret the above perf-profiles?
>
> It appears that the changes of handle_mm_fault() and __mmdrop() are
> related to the code of the commit? That is, for this specific workloads
> (not so unpractical), the operations become slower?
>
> Best Regards,
> Huang, Ying
>

2023-01-30 16:30:12

by Shakeel Butt

[permalink] [raw]

Subject: Re: [linux-next:master] [mm] f1a7941243: unixbench.score -5.1% regression

On Sun, Jan 29, 2023 at 6:37 PM Oliver Sang <[email protected]> wrote:
>
> Hi, Shakeel Butt,
>
> please be noted we noticed the regression is still existing after this commit
> merged to mainline, we reported as
> https://lore.kernel.org/all/[email protected]/
>

Let's discuss this on the other email chain.