2021-04-09 03:51:28

by Zhangshaokun

[permalink] [raw]
Subject: [PATCH] fs: Optimized file struct to improve performance

From: Yuqi Jin <[email protected]>

In the syscall process, @f_count and @f_mod are frequently used, if
we put them together with each other and they will share the same
cacheline. It is useful for the performance.

syscall of unixbench is tested on Intel 8180.
before this patch
80 CPUs in system; running 80 parallel copies of tests

System Call Overhead 3789860.2 lps (10.0 s, 1 samples)

System Benchmarks Partial Index BASELINE RESULT INDEX
System Call Overhead 15000.0 3789860.2 2526.6
========
System Benchmarks Index Score (Partial Only) 2526.6

after this patch
80 CPUs in system; running 80 parallel copies of tests

System Call Overhead 3951328.1 lps (10.0 s, 1 samples)

System Benchmarks Partial Index BASELINE RESULT INDEX
System Call Overhead 15000.0 3951328.1 2634.2
========
System Benchmarks Index Score (Partial Only) 2634.2

Cc: Alexander Viro <[email protected]>
Signed-off-by: Yuqi Jin <[email protected]>
Signed-off-by: Shaokun Zhang <[email protected]>
---
include/linux/fs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3fbb98126248..cfc91d2dd6a7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -917,7 +917,6 @@ struct file {
struct llist_node fu_llist;
struct rcu_head fu_rcuhead;
} f_u;
- struct path f_path;
struct inode *f_inode; /* cached value */
const struct file_operations *f_op;

@@ -926,13 +925,14 @@ struct file {
* Must not be taken from IRQ context.
*/
spinlock_t f_lock;
- enum rw_hint f_write_hint;
atomic_long_t f_count;
unsigned int f_flags;
fmode_t f_mode;
struct mutex f_pos_lock;
loff_t f_pos;
struct fown_struct f_owner;
+ enum rw_hint f_write_hint;
+ struct path f_path;
const struct cred *f_cred;
struct file_ra_state f_ra;

--
2.7.4


2021-04-20 05:54:13

by kernel test robot

[permalink] [raw]
Subject: [fs] aec499039e: unixbench.score 19.2% improvement



Greeting,

FYI, we noticed a 19.2% improvement of unixbench.score due to commit:


commit: aec499039e7b21224ef29e5a2daba328aec14442 ("[PATCH] fs: Optimized file struct to improve performance")
url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Optimized-file-struct-to-improve-performance/20210409-114859
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486

in testcase: unixbench
on test machine: 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory
with following parameters:

runtime: 300s
nr_task: 30%
test: syscall
cpufreq_governor: performance
ucode: 0x4003006

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp4/syscall/unixbench/0x4003006

commit:
5e46d1b78a ("reiserfs: update reiserfs_xattrs_initialized() condition")
aec499039e ("fs: Optimized file struct to improve performance")

5e46d1b78a03d523 aec499039e7b21224ef29e5a2da
---------------- ---------------------------
%stddev %change %stddev
\ | \
2768 +19.2% 3298 unixbench.score
176.43 +19.8% 211.43 unixbench.time.user_time
1.622e+09 +19.2% 1.933e+09 unixbench.workload
348.17 ? 48% -25.2% 260.57 ? 68% proc-vmstat.nr_mlock
4081405 ?133% -99.2% 33639 ? 15% turbostat.C1
1.348e+10 ? 89% -76.6% 3.151e+09 ?190% cpuidle.C6.time
1360129 ?137% -86.4% 184629 ? 2% cpuidle.POLL.time
1.00 ? 10% -0.2 0.81 ? 3% mpstat.cpu.all.irq%
0.49 +0.1 0.59 mpstat.cpu.all.usr%
0.01 ? 23% -36.4% 0.00 ? 13% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
0.06 ? 43% -48.4% 0.03 ? 42% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.05 ? 49% -55.1% 0.02 ? 47% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
765.20 ? 20% -34.3% 502.83 ? 29% perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
1930 ? 13% -31.8% 1316 ? 30% perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
765.19 ? 20% -34.3% 502.82 ? 29% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
1930 ? 13% -31.8% 1316 ? 30% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
2787 ?215% -100.0% 0.71 ?162% interrupts.124:PCI-MSI.31981657-edge.i40e-eth0-TxRx-88
385.17 ?128% -99.9% 0.29 ?158% interrupts.61:PCI-MSI.31981594-edge.i40e-eth0-TxRx-25
4052 ? 49% -57.3% 1732 ?102% interrupts.CPU27.NMI:Non-maskable_interrupts
4052 ? 49% -57.3% 1732 ?102% interrupts.CPU27.PMI:Performance_monitoring_interrupts
438.67 ?122% +697.3% 3497 ? 37% interrupts.CPU3.NMI:Non-maskable_interrupts
438.67 ?122% +697.3% 3497 ? 37% interrupts.CPU3.PMI:Performance_monitoring_interrupts
289.00 ? 84% +1542.3% 4746 ? 24% interrupts.CPU51.NMI:Non-maskable_interrupts
289.00 ? 84% +1542.3% 4746 ? 24% interrupts.CPU51.PMI:Performance_monitoring_interrupts
135.17 ? 18% -29.9% 94.71 ? 26% interrupts.CPU59.RES:Rescheduling_interrupts
4872 ? 27% -48.9% 2490 ? 90% interrupts.CPU74.NMI:Non-maskable_interrupts
4872 ? 27% -48.9% 2490 ? 90% interrupts.CPU74.PMI:Performance_monitoring_interrupts
2786 ?215% -100.0% 0.43 ?169% interrupts.CPU88.124:PCI-MSI.31981657-edge.i40e-eth0-TxRx-88
13.38 ? 7% -13.4 0.00 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
30.66 ? 9% -6.4 24.27 ? 10% perf-profile.calltrace.cycles-pp.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
30.82 ? 9% -6.4 24.46 ? 10% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.10 ? 8% -1.3 5.85 ? 11% perf-profile.calltrace.cycles-pp.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
7.14 ? 8% -1.2 5.89 ? 11% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
7.18 ? 8% -1.2 5.93 ? 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
7.15 ? 8% -1.2 5.91 ? 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
7.27 ? 8% -1.2 6.04 ? 11% perf-profile.calltrace.cycles-pp.__close
5.29 ? 8% +5.4 10.68 ? 10% perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.39 ? 7% -13.3 0.07 ? 12% perf-profile.children.cycles-pp.dnotify_flush
37.79 ? 8% -7.6 30.16 ? 10% perf-profile.children.cycles-pp.filp_close
37.97 ? 8% -7.6 30.36 ? 10% perf-profile.children.cycles-pp.__x64_sys_close
7.30 ? 8% -1.2 6.07 ? 11% perf-profile.children.cycles-pp.__close
0.70 ? 10% -0.1 0.56 ? 10% perf-profile.children.cycles-pp.hrtimer_interrupt
0.71 ? 11% -0.1 0.57 ? 10% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.39 ? 16% -0.1 0.29 ? 9% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.27 ? 13% -0.1 0.22 ? 10% perf-profile.children.cycles-pp.tick_sched_timer
5.29 ? 8% +5.4 10.69 ? 10% perf-profile.children.cycles-pp.fput_many
13.33 ? 7% -13.3 0.06 ? 11% perf-profile.self.cycles-pp.dnotify_flush
5.27 ? 8% +5.4 10.64 ? 10% perf-profile.self.cycles-pp.fput_many
17.97 ? 46% -58.2% 7.51 ? 16% perf-stat.i.MPKI
1.073e+09 +16.2% 1.247e+09 perf-stat.i.branch-instructions
2.60 ? 34% -1.0 1.62 ? 2% perf-stat.i.branch-miss-rate%
56435130 ? 26% -31.6% 38588360 ? 21% perf-stat.i.cache-references
12.06 ? 3% -15.8% 10.16 perf-stat.i.cpi
0.10 ?100% -0.1 0.02 ?202% perf-stat.i.dTLB-load-miss-rate%
1.682e+09 +16.9% 1.965e+09 perf-stat.i.dTLB-loads
0.03 ? 93% -0.0 0.01 ?142% perf-stat.i.dTLB-store-miss-rate%
1.11e+09 +17.5% 1.304e+09 perf-stat.i.dTLB-stores
5.314e+09 +16.2% 6.176e+09 perf-stat.i.instructions
0.10 ? 11% +18.1% 0.12 ? 2% perf-stat.i.ipc
40.93 +16.1% 47.51 perf-stat.i.metric.M/sec
89.63 ? 2% +6.5 96.16 perf-stat.i.node-load-miss-rate%
3653512 ? 3% -57.3% 1561506 perf-stat.i.node-load-misses
371566 ? 19% -90.8% 34031 ? 8% perf-stat.i.node-loads
10.59 ? 25% -40.8% 6.27 ? 23% perf-stat.overall.MPKI
1.92 ? 8% -0.3 1.61 perf-stat.overall.branch-miss-rate%
13.04 -14.2% 11.19 perf-stat.overall.cpi
0.02 ? 89% -0.0 0.00 ?148% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 72% -0.0 0.00 ? 69% perf-stat.overall.dTLB-store-miss-rate%
318.50 +13.2% 360.58 perf-stat.overall.instructions-per-iTLB-miss
0.08 +16.6% 0.09 perf-stat.overall.ipc
90.76 ? 2% +7.1 97.87 perf-stat.overall.node-load-miss-rate%
1286 -2.7% 1251 perf-stat.overall.path-length
1.072e+09 +16.2% 1.246e+09 perf-stat.ps.branch-instructions
1.68e+09 +16.9% 1.964e+09 perf-stat.ps.dTLB-loads
1.109e+09 +17.6% 1.303e+09 perf-stat.ps.dTLB-stores
5.307e+09 +16.3% 6.171e+09 perf-stat.ps.instructions
3649615 ? 3% -57.2% 1560409 perf-stat.ps.node-load-misses
371135 ? 19% -90.9% 33946 ? 8% perf-stat.ps.node-loads
2.086e+12 +16.0% 2.419e+12 perf-stat.total.instructions
10629 ? 12% -17.7% 8746 ? 8% softirqs.CPU10.RCU
9891 ? 7% -14.6% 8447 ? 9% softirqs.CPU13.RCU
43153 ? 3% -7.4% 39975 ? 4% softirqs.CPU30.SCHED
9938 ? 6% -12.9% 8660 ? 2% softirqs.CPU33.RCU
9900 ? 9% -14.1% 8500 ? 5% softirqs.CPU38.RCU
9730 ? 6% -10.3% 8731 ? 7% softirqs.CPU40.RCU
10238 ? 8% -15.0% 8703 ? 9% softirqs.CPU44.RCU
10045 ? 10% -15.7% 8471 ? 6% softirqs.CPU45.RCU
10074 ? 7% -15.4% 8524 ? 6% softirqs.CPU46.RCU
9793 ? 6% -12.0% 8617 ? 8% softirqs.CPU49.RCU
10809 ? 18% -19.0% 8750 ? 8% softirqs.CPU50.RCU
10484 ? 7% -13.3% 9088 ? 10% softirqs.CPU53.RCU
10059 ? 7% -13.2% 8732 ? 7% softirqs.CPU54.RCU
10298 ? 4% -13.5% 8912 ? 7% softirqs.CPU55.RCU
9932 ? 8% -12.4% 8699 ? 5% softirqs.CPU60.RCU
10268 ? 9% -17.1% 8514 ? 7% softirqs.CPU61.RCU
9895 ? 5% -9.0% 9008 ? 5% softirqs.CPU67.RCU
10294 ? 8% -12.0% 9060 ? 5% softirqs.CPU68.RCU
11048 ? 14% -17.2% 9152 ? 6% softirqs.CPU69.RCU
9586 ? 7% -9.1% 8715 ? 5% softirqs.CPU74.RCU
9555 ? 7% -10.1% 8587 ? 5% softirqs.CPU76.RCU
9892 ? 10% -14.8% 8425 ? 5% softirqs.CPU80.RCU
9722 ? 6% -13.5% 8407 ? 6% softirqs.CPU82.RCU
9883 ? 6% -12.7% 8624 ? 4% softirqs.CPU83.RCU
9507 ? 5% -9.9% 8567 ? 4% softirqs.CPU84.RCU
9878 ? 8% -14.1% 8485 ? 3% softirqs.CPU85.RCU
37959 ? 4% -12.9% 33055 ? 6% softirqs.CPU85.SCHED
10338 ? 12% -16.6% 8623 ? 4% softirqs.CPU86.RCU
9885 ? 8% -14.8% 8423 ? 4% softirqs.CPU87.RCU
9934 ? 7% -12.9% 8649 ? 5% softirqs.CPU88.RCU
10119 ? 8% -16.0% 8502 ? 5% softirqs.CPU89.RCU
9958 ? 7% -13.5% 8612 ? 4% softirqs.CPU92.RCU
9917 ? 8% -14.3% 8498 ? 5% softirqs.CPU93.RCU
10070 ? 8% -14.3% 8625 ? 6% softirqs.CPU94.RCU
10157 ? 11% -11.7% 8967 ? 7% softirqs.CPU95.RCU
19377 ? 60% -69.7% 5871 ? 82% softirqs.NET_RX
944995 ? 4% -10.5% 845954 ? 6% softirqs.RCU



unixbench.score

3400 +--------------------------------------------------------------------+
3300 |-+O O O OO OO OO OO |
|O O OO O O O |
3200 |-+ O O O O |
3100 |-+ |
| |
3000 |-+ |
2900 |-+ |
2800 |-+ .+ ++. |
| +. .++ +.+++.++.++.++.++ :+ +|
2700 |+.++. .+ +. +.+ .++.+ + .++ + : + |
2600 |-+ ++ + :.+ + ++ + + + |
| + : + :+ |
2500 |-+ + + |
2400 +--------------------------------------------------------------------+


unixbench.workload

2e+09 +-----------------------------------------------------------------+
| O O O OO OO OOO O |
1.9e+09 |O+ O OO O O |
| O O O O O |
| |
1.8e+09 |-+ |
| |
1.7e+09 |-+ |
| + ++. |
1.6e+09 |-+ + + .+ ++ +.+++.++.+++.++.+ :+ +|
|+.+ + + :+ +.+ .+++ +. +.+ + : + |
| ++ .+ + + + ++ + + + |
1.5e+09 |-+ + + :+ |
| + |
1.4e+09 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (15.18 kB)
config-5.12.0-rc5-00018-gaec499039e7b (175.52 kB)
job-script (8.20 kB)
job.yaml (5.51 kB)
reproduce (290.00 B)
Download all attachments

2021-05-10 12:52:31

by Zhangshaokun

[permalink] [raw]
Subject: Re: [fs] aec499039e: unixbench.score 19.2% improvement

Hi maintainers,

A gentle ping.

Thanks,
Shaokun

On 2021/4/20 14:10, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed a 19.2% improvement of unixbench.score due to commit:
>
>
> commit: aec499039e7b21224ef29e5a2daba328aec14442 ("[PATCH] fs: Optimized file struct to improve performance")
> url: https://github.com/0day-ci/linux/commits/Shaokun-Zhang/fs-Optimized-file-struct-to-improve-performance/20210409-114859
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 5e46d1b78a03d52306f21f77a4e4a144b6d31486
>
> in testcase: unixbench
> on test machine: 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 30%
> test: syscall
> cpufreq_governor: performance
> ucode: 0x4003006
>
> test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> test-url: https://github.com/kdlucas/byte-unixbench
>
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml
> bin/lkp run compatible-job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
> gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp4/syscall/unixbench/0x4003006
>
> commit:
> 5e46d1b78a ("reiserfs: update reiserfs_xattrs_initialized() condition")
> aec499039e ("fs: Optimized file struct to improve performance")
>
> 5e46d1b78a03d523 aec499039e7b21224ef29e5a2da
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2768 +19.2% 3298 unixbench.score
> 176.43 +19.8% 211.43 unixbench.time.user_time
> 1.622e+09 +19.2% 1.933e+09 unixbench.workload
> 348.17 ? 48% -25.2% 260.57 ? 68% proc-vmstat.nr_mlock
> 4081405 ?133% -99.2% 33639 ? 15% turbostat.C1
> 1.348e+10 ? 89% -76.6% 3.151e+09 ?190% cpuidle.C6.time
> 1360129 ?137% -86.4% 184629 ? 2% cpuidle.POLL.time
> 1.00 ? 10% -0.2 0.81 ? 3% mpstat.cpu.all.irq%
> 0.49 +0.1 0.59 mpstat.cpu.all.usr%
> 0.01 ? 23% -36.4% 0.00 ? 13% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
> 0.06 ? 43% -48.4% 0.03 ? 42% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
> 0.05 ? 49% -55.1% 0.02 ? 47% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
> 765.20 ? 20% -34.3% 502.83 ? 29% perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
> 1930 ? 13% -31.8% 1316 ? 30% perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
> 765.19 ? 20% -34.3% 502.82 ? 29% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
> 1930 ? 13% -31.8% 1316 ? 30% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.blk_execute_rq
> 2787 ?215% -100.0% 0.71 ?162% interrupts.124:PCI-MSI.31981657-edge.i40e-eth0-TxRx-88
> 385.17 ?128% -99.9% 0.29 ?158% interrupts.61:PCI-MSI.31981594-edge.i40e-eth0-TxRx-25
> 4052 ? 49% -57.3% 1732 ?102% interrupts.CPU27.NMI:Non-maskable_interrupts
> 4052 ? 49% -57.3% 1732 ?102% interrupts.CPU27.PMI:Performance_monitoring_interrupts
> 438.67 ?122% +697.3% 3497 ? 37% interrupts.CPU3.NMI:Non-maskable_interrupts
> 438.67 ?122% +697.3% 3497 ? 37% interrupts.CPU3.PMI:Performance_monitoring_interrupts
> 289.00 ? 84% +1542.3% 4746 ? 24% interrupts.CPU51.NMI:Non-maskable_interrupts
> 289.00 ? 84% +1542.3% 4746 ? 24% interrupts.CPU51.PMI:Performance_monitoring_interrupts
> 135.17 ? 18% -29.9% 94.71 ? 26% interrupts.CPU59.RES:Rescheduling_interrupts
> 4872 ? 27% -48.9% 2490 ? 90% interrupts.CPU74.NMI:Non-maskable_interrupts
> 4872 ? 27% -48.9% 2490 ? 90% interrupts.CPU74.PMI:Performance_monitoring_interrupts
> 2786 ?215% -100.0% 0.43 ?169% interrupts.CPU88.124:PCI-MSI.31981657-edge.i40e-eth0-TxRx-88
> 13.38 ? 7% -13.4 0.00 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 30.66 ? 9% -6.4 24.27 ? 10% perf-profile.calltrace.cycles-pp.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 30.82 ? 9% -6.4 24.46 ? 10% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 7.10 ? 8% -1.3 5.85 ? 11% perf-profile.calltrace.cycles-pp.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
> 7.14 ? 8% -1.2 5.89 ? 11% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
> 7.18 ? 8% -1.2 5.93 ? 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
> 7.15 ? 8% -1.2 5.91 ? 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
> 7.27 ? 8% -1.2 6.04 ? 11% perf-profile.calltrace.cycles-pp.__close
> 5.29 ? 8% +5.4 10.68 ? 10% perf-profile.calltrace.cycles-pp.fput_many.filp_close.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.39 ? 7% -13.3 0.07 ? 12% perf-profile.children.cycles-pp.dnotify_flush
> 37.79 ? 8% -7.6 30.16 ? 10% perf-profile.children.cycles-pp.filp_close
> 37.97 ? 8% -7.6 30.36 ? 10% perf-profile.children.cycles-pp.__x64_sys_close
> 7.30 ? 8% -1.2 6.07 ? 11% perf-profile.children.cycles-pp.__close
> 0.70 ? 10% -0.1 0.56 ? 10% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.71 ? 11% -0.1 0.57 ? 10% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 0.39 ? 16% -0.1 0.29 ? 9% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 0.27 ? 13% -0.1 0.22 ? 10% perf-profile.children.cycles-pp.tick_sched_timer
> 5.29 ? 8% +5.4 10.69 ? 10% perf-profile.children.cycles-pp.fput_many
> 13.33 ? 7% -13.3 0.06 ? 11% perf-profile.self.cycles-pp.dnotify_flush
> 5.27 ? 8% +5.4 10.64 ? 10% perf-profile.self.cycles-pp.fput_many
> 17.97 ? 46% -58.2% 7.51 ? 16% perf-stat.i.MPKI
> 1.073e+09 +16.2% 1.247e+09 perf-stat.i.branch-instructions
> 2.60 ? 34% -1.0 1.62 ? 2% perf-stat.i.branch-miss-rate%
> 56435130 ? 26% -31.6% 38588360 ? 21% perf-stat.i.cache-references
> 12.06 ? 3% -15.8% 10.16 perf-stat.i.cpi
> 0.10 ?100% -0.1 0.02 ?202% perf-stat.i.dTLB-load-miss-rate%
> 1.682e+09 +16.9% 1.965e+09 perf-stat.i.dTLB-loads
> 0.03 ? 93% -0.0 0.01 ?142% perf-stat.i.dTLB-store-miss-rate%
> 1.11e+09 +17.5% 1.304e+09 perf-stat.i.dTLB-stores
> 5.314e+09 +16.2% 6.176e+09 perf-stat.i.instructions
> 0.10 ? 11% +18.1% 0.12 ? 2% perf-stat.i.ipc
> 40.93 +16.1% 47.51 perf-stat.i.metric.M/sec
> 89.63 ? 2% +6.5 96.16 perf-stat.i.node-load-miss-rate%
> 3653512 ? 3% -57.3% 1561506 perf-stat.i.node-load-misses
> 371566 ? 19% -90.8% 34031 ? 8% perf-stat.i.node-loads
> 10.59 ? 25% -40.8% 6.27 ? 23% perf-stat.overall.MPKI
> 1.92 ? 8% -0.3 1.61 perf-stat.overall.branch-miss-rate%
> 13.04 -14.2% 11.19 perf-stat.overall.cpi
> 0.02 ? 89% -0.0 0.00 ?148% perf-stat.overall.dTLB-load-miss-rate%
> 0.00 ? 72% -0.0 0.00 ? 69% perf-stat.overall.dTLB-store-miss-rate%
> 318.50 +13.2% 360.58 perf-stat.overall.instructions-per-iTLB-miss
> 0.08 +16.6% 0.09 perf-stat.overall.ipc
> 90.76 ? 2% +7.1 97.87 perf-stat.overall.node-load-miss-rate%
> 1286 -2.7% 1251 perf-stat.overall.path-length
> 1.072e+09 +16.2% 1.246e+09 perf-stat.ps.branch-instructions
> 1.68e+09 +16.9% 1.964e+09 perf-stat.ps.dTLB-loads
> 1.109e+09 +17.6% 1.303e+09 perf-stat.ps.dTLB-stores
> 5.307e+09 +16.3% 6.171e+09 perf-stat.ps.instructions
> 3649615 ? 3% -57.2% 1560409 perf-stat.ps.node-load-misses
> 371135 ? 19% -90.9% 33946 ? 8% perf-stat.ps.node-loads
> 2.086e+12 +16.0% 2.419e+12 perf-stat.total.instructions
> 10629 ? 12% -17.7% 8746 ? 8% softirqs.CPU10.RCU
> 9891 ? 7% -14.6% 8447 ? 9% softirqs.CPU13.RCU
> 43153 ? 3% -7.4% 39975 ? 4% softirqs.CPU30.SCHED
> 9938 ? 6% -12.9% 8660 ? 2% softirqs.CPU33.RCU
> 9900 ? 9% -14.1% 8500 ? 5% softirqs.CPU38.RCU
> 9730 ? 6% -10.3% 8731 ? 7% softirqs.CPU40.RCU
> 10238 ? 8% -15.0% 8703 ? 9% softirqs.CPU44.RCU
> 10045 ? 10% -15.7% 8471 ? 6% softirqs.CPU45.RCU
> 10074 ? 7% -15.4% 8524 ? 6% softirqs.CPU46.RCU
> 9793 ? 6% -12.0% 8617 ? 8% softirqs.CPU49.RCU
> 10809 ? 18% -19.0% 8750 ? 8% softirqs.CPU50.RCU
> 10484 ? 7% -13.3% 9088 ? 10% softirqs.CPU53.RCU
> 10059 ? 7% -13.2% 8732 ? 7% softirqs.CPU54.RCU
> 10298 ? 4% -13.5% 8912 ? 7% softirqs.CPU55.RCU
> 9932 ? 8% -12.4% 8699 ? 5% softirqs.CPU60.RCU
> 10268 ? 9% -17.1% 8514 ? 7% softirqs.CPU61.RCU
> 9895 ? 5% -9.0% 9008 ? 5% softirqs.CPU67.RCU
> 10294 ? 8% -12.0% 9060 ? 5% softirqs.CPU68.RCU
> 11048 ? 14% -17.2% 9152 ? 6% softirqs.CPU69.RCU
> 9586 ? 7% -9.1% 8715 ? 5% softirqs.CPU74.RCU
> 9555 ? 7% -10.1% 8587 ? 5% softirqs.CPU76.RCU
> 9892 ? 10% -14.8% 8425 ? 5% softirqs.CPU80.RCU
> 9722 ? 6% -13.5% 8407 ? 6% softirqs.CPU82.RCU
> 9883 ? 6% -12.7% 8624 ? 4% softirqs.CPU83.RCU
> 9507 ? 5% -9.9% 8567 ? 4% softirqs.CPU84.RCU
> 9878 ? 8% -14.1% 8485 ? 3% softirqs.CPU85.RCU
> 37959 ? 4% -12.9% 33055 ? 6% softirqs.CPU85.SCHED
> 10338 ? 12% -16.6% 8623 ? 4% softirqs.CPU86.RCU
> 9885 ? 8% -14.8% 8423 ? 4% softirqs.CPU87.RCU
> 9934 ? 7% -12.9% 8649 ? 5% softirqs.CPU88.RCU
> 10119 ? 8% -16.0% 8502 ? 5% softirqs.CPU89.RCU
> 9958 ? 7% -13.5% 8612 ? 4% softirqs.CPU92.RCU
> 9917 ? 8% -14.3% 8498 ? 5% softirqs.CPU93.RCU
> 10070 ? 8% -14.3% 8625 ? 6% softirqs.CPU94.RCU
> 10157 ? 11% -11.7% 8967 ? 7% softirqs.CPU95.RCU
> 19377 ? 60% -69.7% 5871 ? 82% softirqs.NET_RX
> 944995 ? 4% -10.5% 845954 ? 6% softirqs.RCU
>
>
>
> unixbench.score
>
> 3400 +--------------------------------------------------------------------+
> 3300 |-+O O O OO OO OO OO |
> |O O OO O O O |
> 3200 |-+ O O O O |
> 3100 |-+ |
> | |
> 3000 |-+ |
> 2900 |-+ |
> 2800 |-+ .+ ++. |
> | +. .++ +.+++.++.++.++.++ :+ +|
> 2700 |+.++. .+ +. +.+ .++.+ + .++ + : + |
> 2600 |-+ ++ + :.+ + ++ + + + |
> | + : + :+ |
> 2500 |-+ + + |
> 2400 +--------------------------------------------------------------------+
>
>
> unixbench.workload
>
> 2e+09 +-----------------------------------------------------------------+
> | O O O OO OO OOO O |
> 1.9e+09 |O+ O OO O O |
> | O O O O O |
> | |
> 1.8e+09 |-+ |
> | |
> 1.7e+09 |-+ |
> | + ++. |
> 1.6e+09 |-+ + + .+ ++ +.+++.++.+++.++.+ :+ +|
> |+.+ + + :+ +.+ .+++ +. +.+ + : + |
> | ++ .+ + + + ++ + + + |
> 1.5e+09 |-+ + + :+ |
> | + |
> 1.4e+09 +-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> ---
> 0DAY/LKP+ Test Infrastructure Open Source Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
>
> Thanks,
> Oliver Sang
>