From: Zhang Yi <[email protected]>
jbd2_transaction_committed() is used to check whether a transaction with
the given tid has already committed, it hold j_state_lock in read mode
and check the tid of current running transaction and committing
transaction, but holding the j_state_lock is expensive.
We have already stored the sequence number of the most recently
committed transaction in journal t->j_commit_sequence, we could do this
check by comparing it with the given tid instead. If the given tid isn't
smaller than j_commit_sequence, we can ensure that the given transaction
has been committed. That way we could drop the expensive lock and
achieve about 10% ~ 20% performance gains in concurrent DIOs on may
virtual machine with 100G ramdisk.
fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
-bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
-group_reporting
Before:
overwrite IOPS=88.2k, BW=344MiB/s
read IOPS=95.7k, BW=374MiB/s
rand overwrite IOPS=98.7k, BW=386MiB/s
randread IOPS=102k, BW=397MiB/s
After:
verwrite: IOPS=105k, BW=410MiB/s
read: IOPS=112k, BW=436MiB/s
rand overwrite: IOPS=104k, BW=404MiB/s
randread: IOPS=111k, BW=432MiB/s
CC: Dave Chinner <[email protected]>
Suggested-by: Dave Chinner <[email protected]>
Link: https://lore.kernel.org/linux-ext4/[email protected]/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
Signed-off-by: Zhang Yi <[email protected]>
---
fs/ext4/inode.c | 4 ++--
fs/jbd2/journal.c | 17 -----------------
include/linux/jbd2.h | 1 -
3 files changed, 2 insertions(+), 20 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 537803250ca9..e8e2865bf9ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
if (journal) {
- if (jbd2_transaction_committed(journal,
- EXT4_I(inode)->i_datasync_tid))
+ if (tid_geq(journal->j_commit_sequence,
+ EXT4_I(inode)->i_datasync_tid))
return false;
if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT))
return !list_empty(&EXT4_I(inode)->i_fc_list);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..73737cd1106f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -786,23 +786,6 @@ int jbd2_fc_end_commit_fallback(journal_t *journal)
}
EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);
-/* Return 1 when transaction with given tid has already committed. */
-int jbd2_transaction_committed(journal_t *journal, tid_t tid)
-{
- int ret = 1;
-
- read_lock(&journal->j_state_lock);
- if (journal->j_running_transaction &&
- journal->j_running_transaction->t_tid == tid)
- ret = 0;
- if (journal->j_committing_transaction &&
- journal->j_committing_transaction->t_tid == tid)
- ret = 0;
- read_unlock(&journal->j_state_lock);
- return ret;
-}
-EXPORT_SYMBOL(jbd2_transaction_committed);
-
/*
* When this function returns the transaction corresponding to tid
* will be completed. If the transaction has currently running, start
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 971f3e826e15..e15ae324169d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1643,7 +1643,6 @@ extern void jbd2_clear_buffer_revoked_flags(journal_t *journal);
int jbd2_log_start_commit(journal_t *journal, tid_t tid);
int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
-int jbd2_transaction_committed(journal_t *journal, tid_t tid);
int jbd2_complete_transaction(journal_t *journal, tid_t tid);
int jbd2_log_do_checkpoint(journal_t *journal);
int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
--
2.39.2
On Mon 13-05-24 15:21:19, Zhang Yi wrote:
> From: Zhang Yi <[email protected]>
>
> jbd2_transaction_committed() is used to check whether a transaction with
> the given tid has already committed, it hold j_state_lock in read mode
> and check the tid of current running transaction and committing
> transaction, but holding the j_state_lock is expensive.
>
> We have already stored the sequence number of the most recently
> committed transaction in journal t->j_commit_sequence, we could do this
> check by comparing it with the given tid instead. If the given tid isn't
> smaller than j_commit_sequence, we can ensure that the given transaction
> has been committed. That way we could drop the expensive lock and
> achieve about 10% ~ 20% performance gains in concurrent DIOs on may
> virtual machine with 100G ramdisk.
>
> fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
> -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
> -group_reporting
>
> Before:
> overwrite IOPS=88.2k, BW=344MiB/s
> read IOPS=95.7k, BW=374MiB/s
> rand overwrite IOPS=98.7k, BW=386MiB/s
> randread IOPS=102k, BW=397MiB/s
>
> After:
> verwrite: IOPS=105k, BW=410MiB/s
> read: IOPS=112k, BW=436MiB/s
> rand overwrite: IOPS=104k, BW=404MiB/s
> randread: IOPS=111k, BW=432MiB/s
>
> CC: Dave Chinner <[email protected]>
> Suggested-by: Dave Chinner <[email protected]>
> Link: https://lore.kernel.org/linux-ext4/[email protected]/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
> Signed-off-by: Zhang Yi <[email protected]>
I agree this is workable solution and the performance benefits are nice. But
I have some comments regarding the implementation:
> @@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
> journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>
> if (journal) {
> - if (jbd2_transaction_committed(journal,
> - EXT4_I(inode)->i_datasync_tid))
> + if (tid_geq(journal->j_commit_sequence,
> + EXT4_I(inode)->i_datasync_tid))
Please leave the helper jbd2_transaction_committed(), just make the
implementation more efficient. Also accessing j_commit_sequence without any
lock is theoretically problematic wrt compiler optimization. You should have
READ_ONCE() there and the places modifying j_commit_sequence need to use
WRITE_ONCE().
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 2024/5/15 8:25, Jan Kara wrote:
> On Mon 13-05-24 15:21:19, Zhang Yi wrote:
>> From: Zhang Yi <[email protected]>
>>
>> jbd2_transaction_committed() is used to check whether a transaction with
>> the given tid has already committed, it hold j_state_lock in read mode
>> and check the tid of current running transaction and committing
>> transaction, but holding the j_state_lock is expensive.
>>
>> We have already stored the sequence number of the most recently
>> committed transaction in journal t->j_commit_sequence, we could do this
>> check by comparing it with the given tid instead. If the given tid isn't
>> smaller than j_commit_sequence, we can ensure that the given transaction
>> has been committed. That way we could drop the expensive lock and
>> achieve about 10% ~ 20% performance gains in concurrent DIOs on may
>> virtual machine with 100G ramdisk.
>>
>> fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
>> -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
>> -group_reporting
>>
>> Before:
>> overwrite IOPS=88.2k, BW=344MiB/s
>> read IOPS=95.7k, BW=374MiB/s
>> rand overwrite IOPS=98.7k, BW=386MiB/s
>> randread IOPS=102k, BW=397MiB/s
>>
>> After:
>> verwrite: IOPS=105k, BW=410MiB/s
>> read: IOPS=112k, BW=436MiB/s
>> rand overwrite: IOPS=104k, BW=404MiB/s
>> randread: IOPS=111k, BW=432MiB/s
>>
>> CC: Dave Chinner <[email protected]>
>> Suggested-by: Dave Chinner <[email protected]>
>> Link: https://lore.kernel.org/linux-ext4/[email protected]/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
>> Signed-off-by: Zhang Yi <[email protected]>
>
> I agree this is workable solution and the performance benefits are nice. But
> I have some comments regarding the implementation:
>
>> @@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>> journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>>
>> if (journal) {
>> - if (jbd2_transaction_committed(journal,
>> - EXT4_I(inode)->i_datasync_tid))
>> + if (tid_geq(journal->j_commit_sequence,
>> + EXT4_I(inode)->i_datasync_tid))
>
> Please leave the helper jbd2_transaction_committed(), just make the
> implementation more efficient.
Sure.
> Also accessing j_commit_sequence without any
> lock is theoretically problematic wrt compiler optimization. You should have
> READ_ONCE() there and the places modifying j_commit_sequence need to use
> WRITE_ONCE().
>
Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
j_commit_sequence, if not, there is a window (it might becomes larger) that
we could get the old value and jbd2_transaction_committed() could return false
even if the given transaction was just committed, but I think the window is
always there, so it looks like it is not a big problem, is that right?
Thanks,
Yi.
On Thu 16-05-24 16:27:25, Zhang Yi wrote:
> On 2024/5/15 8:25, Jan Kara wrote:
> > On Mon 13-05-24 15:21:19, Zhang Yi wrote:
> > Also accessing j_commit_sequence without any
> > lock is theoretically problematic wrt compiler optimization. You should have
> > READ_ONCE() there and the places modifying j_commit_sequence need to use
> > WRITE_ONCE().
> >
>
> Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
> here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
> j_commit_sequence, if not, there is a window (it might becomes larger) that
> we could get the old value and jbd2_transaction_committed() could return false
> even if the given transaction was just committed, but I think the window is
> always there, so it looks like it is not a big problem, is that right?
Well, all accesses to any memory should use READ_ONCE(), be protected by a
lock, or use types that handle atomicity on assembly level (like atomic_t,
or atomic bit operations and similar). Otherwise the compiler is free to
assume the underlying memory cannot change and generate potentionally
invalid code. In this case, I don't think realistically any compiler will
do it but still it is a good practice and also it saves us from KCSAN
warnings. If you want to know more details about possible problems, see
tools/memory-model/Documentation/explanation.txt
chapter "PLAIN ACCESSES AND DATA RACES".
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
Hello,
kernel test robot noticed a 674.0% improvement of stress-ng.fiemap.ops_per_sec on:
commit: 8a7952dd64b38f3ac2754ca550b52ac3ca921c1a ("[PATCH] ext4/jbd2: drop jbd2_transaction_committed()")
url: https://github.com/intel-lab-lkp/linux/commits/Zhang-Yi/ext4-jbd2-drop-jbd2_transaction_committed/20240513-153311
base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
disk: 1HDD
testtime: 60s
fs: ext4
test: fiemap
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/[email protected]
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fiemap/stress-ng/60s
commit:
26770a717c ("jbd2: add prefix 'jbd2' for 'shrink_type'")
8a7952dd64 ("ext4/jbd2: drop jbd2_transaction_committed()")
26770a717cac5704 8a7952dd64b38f3ac2754ca550b
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.13 ? 5% +52.7% 1.73 ? 3% iostat.cpu.user
0.01 ? 47% +0.0 0.04 ? 44% mpstat.cpu.all.iowait%
0.02 ? 4% -0.0 0.01 ? 15% mpstat.cpu.all.soft%
1.15 ? 5% +0.6 1.75 ? 3% mpstat.cpu.all.usr%
100427 ? 42% -59.1% 41106 ? 66% numa-vmstat.node0.nr_anon_pages
149979 ? 34% +205.2% 457720 ? 31% numa-vmstat.node1.nr_inactive_anon
149980 ? 34% +205.2% 457720 ? 31% numa-vmstat.node1.nr_zone_inactive_anon
402376 ? 42% -59.1% 164453 ? 66% numa-meminfo.node0.AnonPages
424656 ? 39% -57.6% 180029 ? 58% numa-meminfo.node0.AnonPages.max
620453 ? 33% +198.4% 1851452 ? 30% numa-meminfo.node1.Inactive
601997 ? 34% +204.4% 1832749 ? 31% numa-meminfo.node1.Inactive(anon)
2387 ? 2% +12.8% 2693 ? 4% vmstat.io.bo
10.17 ? 13% +83.8% 18.69 ? 3% vmstat.procs.b
201402 ? 9% +631.9% 1474068 ? 6% vmstat.system.cs
167899 +5.5% 177141 vmstat.system.in
347.50 ? 16% +53.4% 533.00 ? 10% perf-c2c.DRAM.local
4232 ? 48% +166.1% 11262 ? 24% perf-c2c.DRAM.remote
23484 ? 10% +119.8% 51621 ? 7% perf-c2c.HITM.local
3377 ? 56% +179.0% 9424 ? 27% perf-c2c.HITM.remote
26861 ? 7% +127.3% 61046 ? 5% perf-c2c.HITM.total
7612782 ? 7% +676.3% 59099460 ? 6% stress-ng.fiemap.ops
124734 ? 8% +674.0% 965466 ? 6% stress-ng.fiemap.ops_per_sec
12752270 ? 10% +643.8% 94847653 ? 5% stress-ng.time.involuntary_context_switches
19.02 ? 6% +124.6% 42.70 ? 5% stress-ng.time.user_time
62295 +17.8% 73411 ? 4% stress-ng.time.voluntary_context_switches
262799 ? 5% +90.1% 499700 ? 43% meminfo.Active
3750501 +38.7% 5203048 ? 2% meminfo.Cached
3989040 +36.2% 5432649 ? 2% meminfo.Committed_AS
1154859 ? 3% +105.8% 2376633 ? 12% meminfo.Inactive
1118790 ? 3% +109.2% 2340758 ? 12% meminfo.Inactive(anon)
362491 ? 7% +153.0% 917217 ? 19% meminfo.Mapped
5902699 +25.8% 7426870 meminfo.Memused
546333 ? 3% +265.7% 1998111 ? 5% meminfo.Shmem
5963280 +25.0% 7456799 meminfo.max_used_kB
0.92 ? 16% -21.5% 0.72 ? 8% sched_debug.cfs_rq:/.h_nr_running.stddev
809.29 ? 9% -15.5% 683.50 ? 7% sched_debug.cfs_rq:/.runnable_avg.stddev
567.16 ? 10% +123.7% 1268 ? 9% sched_debug.cfs_rq:/.util_est.avg
2015 ? 8% +41.9% 2859 ? 11% sched_debug.cfs_rq:/.util_est.max
0.75 ?223% +23377.8% 176.08 ? 20% sched_debug.cfs_rq:/.util_est.min
447.08 ? 8% +19.1% 532.35 ? 12% sched_debug.cfs_rq:/.util_est.stddev
0.92 ? 16% -22.1% 0.71 ? 8% sched_debug.cpu.nr_running.stddev
101043 ? 9% +621.7% 729210 ? 5% sched_debug.cpu.nr_switches.avg
176826 ? 18% +423.9% 926474 ? 5% sched_debug.cpu.nr_switches.max
17586 ? 35% +762.4% 151656 ? 44% sched_debug.cpu.nr_switches.min
54968 ? 44% +170.9% 148885 ? 5% sched_debug.cpu.nr_switches.stddev
5712 +5.1% 6002 ? 2% proc-vmstat.nr_active_file
939938 +38.6% 1302470 ? 2% proc-vmstat.nr_file_pages
280225 ? 3% +108.7% 584958 ? 12% proc-vmstat.nr_inactive_anon
91383 ? 8% +149.7% 228197 ? 19% proc-vmstat.nr_mapped
136324 ? 3% +265.7% 498577 ? 5% proc-vmstat.nr_shmem
25123 +2.8% 25834 proc-vmstat.nr_slab_reclaimable
5712 +5.1% 6002 ? 2% proc-vmstat.nr_zone_active_file
280225 ? 3% +108.7% 584958 ? 12% proc-vmstat.nr_zone_inactive_anon
15324 ? 30% +359.4% 70400 ? 9% proc-vmstat.numa_hint_faults
10098 ? 24% +283.5% 38729 ? 19% proc-vmstat.numa_hint_faults_local
604637 +86.4% 1127295 ? 4% proc-vmstat.numa_hit
538359 +97.1% 1061202 ? 4% proc-vmstat.numa_local
532388 +13.4% 603511 ? 3% proc-vmstat.numa_pte_updates
723145 +72.1% 1244321 ? 3% proc-vmstat.pgalloc_normal
383685 ? 2% +21.4% 465726 ? 2% proc-vmstat.pgfault
152450 +13.4% 172900 ? 4% proc-vmstat.pgpgout
2.951e+09 ? 6% +571.9% 1.983e+10 ? 2% perf-stat.i.branch-instructions
0.64 ? 4% -0.2 0.40 ? 2% perf-stat.i.branch-miss-rate%
19046341 ? 4% +291.2% 74517781 perf-stat.i.branch-misses
9666270 ? 31% +318.8% 40484483 ? 19% perf-stat.i.cache-misses
70884859 ? 6% +367.8% 3.316e+08 ? 2% perf-stat.i.cache-references
213832 ? 9% +623.2% 1546476 ? 6% perf-stat.i.context-switches
15.20 ? 7% -84.9% 2.30 ? 2% perf-stat.i.cpi
414.63 ? 5% +10.9% 459.73 ? 8% perf-stat.i.cpu-migrations
26710 ? 31% -77.7% 5960 ? 22% perf-stat.i.cycles-between-cache-misses
1.501e+10 ? 6% +556.2% 9.849e+10 ? 2% perf-stat.i.instructions
0.07 ? 6% +515.9% 0.44 ? 2% perf-stat.i.ipc
3.33 ? 9% +626.6% 24.22 ? 6% perf-stat.i.metric.K/sec
4908 ? 3% +26.6% 6216 ? 3% perf-stat.i.minor-faults
4908 ? 3% +26.6% 6216 ? 3% perf-stat.i.page-faults
0.65 ? 5% -0.3 0.38 ? 2% perf-stat.overall.branch-miss-rate%
15.14 ? 6% -84.8% 2.30 ? 2% perf-stat.overall.cpi
25496 ? 29% -77.2% 5824 ? 21% perf-stat.overall.cycles-between-cache-misses
0.07 ? 6% +556.1% 0.44 ? 2% perf-stat.overall.ipc
2.9e+09 ? 6% +572.3% 1.95e+10 ? 2% perf-stat.ps.branch-instructions
18850245 ? 4% +289.0% 73330498 perf-stat.ps.branch-misses
9571442 ? 31% +315.9% 39805993 ? 19% perf-stat.ps.cache-misses
70175556 ? 6% +365.5% 3.267e+08 ? 2% perf-stat.ps.cache-references
205492 ? 9% +636.9% 1514212 ? 6% perf-stat.ps.context-switches
405.62 ? 5% +10.8% 449.30 ? 8% perf-stat.ps.cpu-migrations
1.475e+10 ? 6% +556.5% 9.684e+10 ? 2% perf-stat.ps.instructions
4823 ? 3% +25.8% 6069 ? 4% perf-stat.ps.minor-faults
4823 ? 3% +25.8% 6069 ? 4% perf-stat.ps.page-faults
9.213e+11 ? 6% +555.4% 6.038e+12 ? 2% perf-stat.total.instructions
85.73 -85.7 0.00 perf-profile.calltrace.cycles-pp.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
86.11 -84.1 1.99 ? 2% perf-profile.calltrace.cycles-pp.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
50.51 ? 3% -50.5 0.00 perf-profile.calltrace.cycles-pp._raw_read_lock.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter
96.50 -12.4 84.06 perf-profile.calltrace.cycles-pp.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
97.49 -5.6 91.94 perf-profile.calltrace.cycles-pp.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
98.08 -1.5 96.62 perf-profile.calltrace.cycles-pp.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
98.13 -1.1 96.99 perf-profile.calltrace.cycles-pp.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
98.14 -1.1 97.05 perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
98.18 -1.1 97.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
98.19 -1.0 97.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl
98.22 -0.9 97.31 perf-profile.calltrace.cycles-pp.ioctl
0.62 ? 9% +0.3 0.94 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.62 ? 9% +0.3 0.96 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.70 ? 9% +0.5 1.18 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.87 ? 8% +0.7 1.58 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.87 ? 8% +0.7 1.59 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
0.00 +1.0 1.00 ? 4% perf-profile.calltrace.cycles-pp.ext4_sb_block_valid.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
0.94 ? 8% +1.0 1.98 ? 2% perf-profile.calltrace.cycles-pp.__sched_yield
0.00 +1.1 1.10 ? 2% perf-profile.calltrace.cycles-pp._copy_to_user.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
0.00 +1.3 1.35 ? 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks
0.00 +1.5 1.48 ? 2% perf-profile.calltrace.cycles-pp.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
0.00 +1.6 1.58 ? 7% perf-profile.calltrace.cycles-pp._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report
0.00 +3.2 3.22 ? 2% perf-profile.calltrace.cycles-pp.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
4.96 ? 6% +4.8 9.72 ? 2% perf-profile.calltrace.cycles-pp._raw_read_lock.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
0.78 ? 7% +5.4 6.22 ? 2% perf-profile.calltrace.cycles-pp.iomap_iter_advance.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
0.62 ? 6% +50.6 51.27 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
9.55 ? 7% +66.3 75.88 perf-profile.calltrace.cycles-pp.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
10.12 ? 7% +69.9 79.98 perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
85.84 -85.8 0.00 perf-profile.children.cycles-pp.jbd2_transaction_committed
86.14 -84.0 2.10 ? 2% perf-profile.children.cycles-pp.ext4_set_iomap
55.59 ? 3% -45.8 9.83 ? 2% perf-profile.children.cycles-pp._raw_read_lock
96.56 -12.1 84.50 perf-profile.children.cycles-pp.ext4_iomap_begin_report
97.54 -5.3 92.26 perf-profile.children.cycles-pp.iomap_iter
98.11 -1.3 96.84 perf-profile.children.cycles-pp.iomap_fiemap
98.14 -1.1 97.00 perf-profile.children.cycles-pp.do_vfs_ioctl
98.14 -1.1 97.06 perf-profile.children.cycles-pp.__x64_sys_ioctl
98.22 -0.9 97.34 perf-profile.children.cycles-pp.ioctl
99.42 -0.7 98.76 perf-profile.children.cycles-pp.do_syscall_64
99.43 -0.6 98.81 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.39 ? 21% -0.1 0.25 ? 6% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.14 ? 24% -0.1 0.03 ?105% perf-profile.children.cycles-pp.build_id__mark_dso_hit
0.19 ? 5% +0.0 0.21 ? 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.08 ? 13% +0.0 0.11 perf-profile.children.cycles-pp.switch_fpu_return
0.09 ? 7% +0.0 0.12 ? 5% perf-profile.children.cycles-pp.update_process_times
0.11 ? 5% +0.0 0.14 ? 6% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.10 ? 5% +0.0 0.14 ? 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__fdget
0.00 +0.1 0.05 ? 7% perf-profile.children.cycles-pp.__update_load_avg_se
0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.00 +0.1 0.06 ? 6% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.06 perf-profile.children.cycles-pp.___perf_sw_event
0.00 +0.1 0.06 perf-profile.children.cycles-pp.__dequeue_entity
0.00 +0.1 0.06 ? 6% perf-profile.children.cycles-pp.__switch_to_asm
0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.pick_eevdf
0.00 +0.1 0.07 perf-profile.children.cycles-pp.__switch_to
0.00 +0.1 0.08 ? 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.00 +0.1 0.08 ? 4% perf-profile.children.cycles-pp.rseq_ip_fixup
0.00 +0.1 0.09 ? 4% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.27 ? 22% +0.1 0.37 ? 9% perf-profile.children.cycles-pp.perf_session__process_events
0.27 ? 22% +0.1 0.37 ? 9% perf-profile.children.cycles-pp.record__finish_output
0.26 ? 25% +0.1 0.36 ? 9% perf-profile.children.cycles-pp.reader__read_event
0.00 +0.1 0.11 ? 3% perf-profile.children.cycles-pp.set_next_entity
0.00 +0.1 0.12 ? 4% perf-profile.children.cycles-pp.update_load_avg
0.08 ? 6% +0.1 0.20 perf-profile.children.cycles-pp.do_sched_yield
0.00 +0.1 0.12 ? 4% perf-profile.children.cycles-pp.put_prev_entity
0.02 ?141% +0.1 0.15 ? 2% perf-profile.children.cycles-pp.update_curr
0.09 ? 36% +0.1 0.23 ? 18% perf-profile.children.cycles-pp.process_simple
0.00 +0.1 0.15 ? 3% perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.07 ? 57% +0.1 0.22 ? 19% perf-profile.children.cycles-pp.queue_event
0.00 +0.1 0.15 ? 4% perf-profile.children.cycles-pp.clear_bhb_loop
0.00 +0.1 0.15 ? 4% perf-profile.children.cycles-pp.stress_fiemap
0.07 ? 57% +0.2 0.22 ? 19% perf-profile.children.cycles-pp.ordered_events__queue
0.00 +0.2 0.16 ? 2% perf-profile.children.cycles-pp.yield_task_fair
0.00 +0.2 0.16 ? 3% perf-profile.children.cycles-pp.ext4_inode_block_valid
0.20 ? 5% +0.2 0.41 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.09 ? 10% +0.3 0.38 ? 2% perf-profile.children.cycles-pp.pick_next_task_fair
0.62 ? 9% +0.3 0.95 perf-profile.children.cycles-pp.__schedule
0.63 ? 10% +0.3 0.97 perf-profile.children.cycles-pp.schedule
0.06 ? 11% +0.4 0.48 ? 3% perf-profile.children.cycles-pp.iomap_to_fiemap
0.70 ? 9% +0.5 1.18 perf-profile.children.cycles-pp.__x64_sys_sched_yield
0.13 ? 8% +0.9 1.05 ? 3% perf-profile.children.cycles-pp.ext4_sb_block_valid
0.94 ? 8% +1.1 2.03 ? 2% perf-profile.children.cycles-pp.__sched_yield
0.16 ? 7% +1.1 1.27 ? 2% perf-profile.children.cycles-pp._copy_to_user
0.20 ? 8% +1.4 1.59 ? 2% perf-profile.children.cycles-pp.__check_block_validity
0.20 ? 7% +1.4 1.64 ? 7% perf-profile.children.cycles-pp._raw_spin_lock
0.41 ? 7% +2.9 3.30 ? 2% perf-profile.children.cycles-pp.fiemap_fill_next_extent
0.80 ? 7% +5.6 6.35 ? 2% perf-profile.children.cycles-pp.iomap_iter_advance
0.64 ? 7% +50.8 51.40 perf-profile.children.cycles-pp.percpu_counter_add_batch
9.60 ? 7% +66.7 76.28 perf-profile.children.cycles-pp.ext4_es_lookup_extent
10.16 ? 7% +70.2 80.39 perf-profile.children.cycles-pp.ext4_map_blocks
55.43 ? 3% -45.7 9.70 ? 2% perf-profile.self.cycles-pp._raw_read_lock
34.41 ? 6% -34.4 0.00 perf-profile.self.cycles-pp.jbd2_transaction_committed
0.10 ? 10% -0.0 0.07 ? 5% perf-profile.self.cycles-pp.prepare_task_switch
0.08 ? 11% +0.0 0.12 ? 4% perf-profile.self.cycles-pp.__schedule
0.00 +0.1 0.05 perf-profile.self.cycles-pp.___perf_sw_event
0.00 +0.1 0.05 perf-profile.self.cycles-pp.__update_load_avg_se
0.00 +0.1 0.05 perf-profile.self.cycles-pp.update_curr
0.00 +0.1 0.05 ? 7% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.00 +0.1 0.06 ? 9% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.__sched_yield
0.00 +0.1 0.06 ? 6% perf-profile.self.cycles-pp.__switch_to_asm
0.00 +0.1 0.07 perf-profile.self.cycles-pp.__switch_to
0.20 ? 7% +0.1 0.28 ? 4% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.1 0.09 ? 4% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.00 +0.1 0.11 ? 4% perf-profile.self.cycles-pp.ext4_inode_block_valid
0.00 +0.1 0.13 ? 5% perf-profile.self.cycles-pp.stress_fiemap
0.07 ? 57% +0.1 0.22 ? 19% perf-profile.self.cycles-pp.queue_event
0.00 +0.1 0.15 ? 4% perf-profile.self.cycles-pp.clear_bhb_loop
0.05 ? 45% +0.3 0.38 ? 2% perf-profile.self.cycles-pp.__check_block_validity
0.03 ? 70% +0.3 0.38 ? 3% perf-profile.self.cycles-pp.iomap_to_fiemap
0.13 ? 7% +0.9 1.00 ? 4% perf-profile.self.cycles-pp.ext4_sb_block_valid
0.13 ? 7% +0.9 1.05 ? 2% perf-profile.self.cycles-pp.iomap_fiemap
0.16 ? 7% +1.1 1.24 ? 2% perf-profile.self.cycles-pp._copy_to_user
0.19 ? 7% +1.3 1.52 ? 2% perf-profile.self.cycles-pp.iomap_iter
0.29 ? 11% +1.6 1.88 ? 2% perf-profile.self.cycles-pp.ext4_set_iomap
0.25 ? 8% +1.8 2.04 ? 2% perf-profile.self.cycles-pp.fiemap_fill_next_extent
0.27 ? 8% +1.9 2.21 ? 2% perf-profile.self.cycles-pp.ext4_iomap_begin_report
0.38 ? 13% +2.3 2.70 ? 4% perf-profile.self.cycles-pp.ext4_map_blocks
0.78 ? 7% +5.4 6.23 ? 2% perf-profile.self.cycles-pp.iomap_iter_advance
3.98 ? 8% +11.1 15.12 ? 2% perf-profile.self.cycles-pp.ext4_es_lookup_extent
0.56 ? 7% +48.9 49.48 perf-profile.self.cycles-pp.percpu_counter_add_batch
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki