2023-11-30 05:50:37

by Oliver Sang

[permalink] [raw]
Subject: [linus:master] [x86/acpi] ec9aedb2aa: aim9.exec_test.ops_per_sec 2.4% improvement



Hello,

kernel test robot noticed a 2.4% improvement of aim9.exec_test.ops_per_sec on:


commit: ec9aedb2aa1ab7ac420c00b31f5edc5be15ec167 ("x86/acpi: Ignore invalid x2APIC entries")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

testtime: 300s
test: exec_test
cpufreq_governor: performance


besides below detailed comparison, we also noticed some difference from dmesg.

for this commit ec9aedb2aa:

[ 1.311075][ T0] smpboot: Allowing 48 CPUs, 0 hotplug CPUs

for parent:

[ 1.311098][ T0] smpboot: Allowing 168 CPUs, 120 hotplug CPUs


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231130/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/exec_test/aim9/300s

commit:
31255e072b ("x86/shstk: Delay signal entry SSP write until after user accesses")
ec9aedb2aa ("x86/acpi: Ignore invalid x2APIC entries")

31255e072b2e91f9 ec9aedb2aa1ab7ac420c00b31f5
---------------- ---------------------------
%stddev %change %stddev
\ | \
8587 ? 3% +5.9% 9091 vmstat.system.cs
6542 ? 9% -18.2% 5352 ? 7% numa-meminfo.node1.KernelStack
57960 ? 4% -12.6% 50656 ? 6% numa-meminfo.node1.SUnreclaim
6541 ? 9% -18.0% 5363 ? 6% numa-vmstat.node1.nr_kernel_stack
14490 ? 4% -12.6% 12663 ? 6% numa-vmstat.node1.nr_slab_unreclaimable
179678 ? 7% -22.6% 139060 ? 10% meminfo.DirectMap4k
13670 -13.6% 11809 meminfo.KernelStack
78243 -72.5% 21498 meminfo.Percpu
1222 +2.4% 1251 aim9.exec_test.ops_per_sec
27978802 +3.1% 28859909 aim9.time.minor_page_faults
175.04 -6.2% 164.11 aim9.time.system_time
115.72 +9.1% 126.24 aim9.time.user_time
731948 +2.4% 749684 aim9.time.voluntary_context_switches
13669 -13.8% 11788 proc-vmstat.nr_kernel_stack
21028 -3.2% 20355 proc-vmstat.nr_slab_reclaimable
29074 -9.0% 26443 proc-vmstat.nr_slab_unreclaimable
50357 -1.3% 49699 proc-vmstat.numa_other
28937047 +3.0% 29790891 proc-vmstat.pgfault
0.55 ? 5% +0.1 0.65 ? 7% perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
1.38 ? 6% -0.7 0.67 ? 9% perf-profile.children.cycles-pp.mm_init
0.87 ? 7% -0.5 0.38 ? 10% perf-profile.children.cycles-pp.pcpu_alloc
0.76 ? 8% -0.3 0.42 ? 8% perf-profile.children.cycles-pp.alloc_bprm
0.50 ? 6% -0.3 0.17 ? 6% perf-profile.children.cycles-pp.memset_orig
0.40 ? 5% -0.2 0.15 ? 18% perf-profile.children.cycles-pp.__percpu_counter_init_many
0.15 ? 20% -0.1 0.03 ?101% perf-profile.children.cycles-pp.mm_init_cid
0.23 ? 14% -0.1 0.12 ? 19% perf-profile.children.cycles-pp._find_next_bit
0.30 ? 10% -0.1 0.24 ? 16% perf-profile.children.cycles-pp.mas_preallocate
0.14 ? 18% -0.0 0.09 ? 16% perf-profile.children.cycles-pp.pm_qos_read_value
0.09 ? 15% -0.0 0.07 ? 10% perf-profile.children.cycles-pp.remove_vma
0.05 ? 47% +0.1 0.11 ? 26% perf-profile.children.cycles-pp.malloc
0.20 ? 22% +0.1 0.25 ? 7% perf-profile.children.cycles-pp.do_brk_flags
0.44 ? 5% +0.1 0.53 ? 8% perf-profile.children.cycles-pp.mod_objcg_state
0.80 ? 4% +0.2 0.96 ? 6% perf-profile.children.cycles-pp.next_uptodate_folio
0.50 ? 7% -0.3 0.17 ? 6% perf-profile.self.cycles-pp.memset_orig
0.26 ? 16% -0.2 0.04 ?106% perf-profile.self.cycles-pp.mm_init
0.14 ? 25% -0.1 0.03 ?100% perf-profile.self.cycles-pp.mm_init_cid
0.18 ? 22% -0.1 0.08 ? 34% perf-profile.self.cycles-pp.pcpu_alloc
0.13 ? 16% -0.0 0.08 ? 20% perf-profile.self.cycles-pp.pm_qos_read_value
0.37 ? 6% +0.1 0.45 ? 10% perf-profile.self.cycles-pp.mod_objcg_state
0.66 ? 5% +0.1 0.80 ? 6% perf-profile.self.cycles-pp.next_uptodate_folio
34087721 ? 2% +3.6% 35301961 perf-stat.i.branch-misses
8601 ? 3% +6.1% 9122 perf-stat.i.context-switches
72.92 ? 2% +7.4% 78.30 ? 3% perf-stat.i.cpu-migrations
1.55 ? 2% -0.1 1.42 ? 3% perf-stat.i.dTLB-load-miss-rate%
0.51 ? 2% -0.2 0.32 perf-stat.i.dTLB-store-miss-rate%
2867856 ? 3% -36.9% 1810983 perf-stat.i.dTLB-store-misses
5.561e+08 ? 2% +3.0% 5.73e+08 perf-stat.i.dTLB-stores
92019 ? 4% +10.2% 101371 perf-stat.i.iTLB-loads
126.43 ? 15% -33.8% 83.76 perf-stat.i.metric.K/sec
90050 ? 4% +6.8% 96193 perf-stat.i.minor-faults
19.22 ? 4% -1.5 17.77 ? 3% perf-stat.i.node-store-miss-rate%
90050 ? 4% +6.8% 96194 perf-stat.i.page-faults
1.48 ? 2% -0.1 1.38 ? 3% perf-stat.overall.dTLB-load-miss-rate%
0.51 -0.2 0.32 perf-stat.overall.dTLB-store-miss-rate%
33982829 ? 2% +3.5% 35183134 perf-stat.ps.branch-misses
8573 ? 3% +6.0% 9090 perf-stat.ps.context-switches
72.73 ? 2% +7.4% 78.13 ? 3% perf-stat.ps.cpu-migrations
2858954 ? 3% -36.9% 1805251 perf-stat.ps.dTLB-store-misses
5.545e+08 ? 2% +3.0% 5.712e+08 perf-stat.ps.dTLB-stores
91889 ? 4% +10.2% 101265 perf-stat.ps.iTLB-loads
89770 ? 4% +6.8% 95880 perf-stat.ps.minor-faults
89771 ? 4% +6.8% 95880 perf-stat.ps.page-faults



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki