FYI, we found changes on `fsmark.files_per_sec' by 78373b7319abdf15050af5b1632c4c8b8b398f33:
> commit 78373b7319abdf15050af5b1632c4c8b8b398f33
> Author: Jaegeuk Kim <[email protected]>
> AuthorDate: Fri Mar 13 21:44:36 2015 -0700
> Commit: Jaegeuk Kim <[email protected]>
> CommitDate: Fri Apr 10 15:08:45 2015 -0700
>
> f2fs: enhance multi-threads performance
3402e87cfb5e762f9c95071bf4a2ad65fd9392a2 78373b7319abdf15050af5b1632c4c8b8b398f33
---------------------------------------- ----------------------------------------
run time(m) metric_value ?stddev run time(m) metric_value ?stddev change testbox/benchmark/testcase-params
--- ------ ---------------------------- --- ------ ---------------------------- -------- ------------------------------
3 0.3 |490.800| ?5.7 3 0.5 |262.067| ?0.4 -46.6% ivb44/fsmark/1x-64t-4BRD_12G-RAID0-f2fs-4M-30G-fsyncBeforeClose
3 0.3 |468.367| ?3.5 3 0.5 |264.467| ?0.2 -43.5% ivb44/fsmark/1x-64t-9BRD_6G-RAID0-f2fs-4M-30G-fsyncBeforeClose
3 0.6 |211.867| ?0.7 3 0.7 |191.067| ?0.5 -9.8% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-f2fs-4M-30G-fsyncBeforeClose
NOTE: here are some more info about those test parameters for you to
know what the testcase does better:
1x: where 'x' means iterations or loop, corresponding to the 'L' option of fsmark
1t, 64t: where 't' means thread
4M: means the single file size, corresponding to the '-s' option of fsmark
40G, 30G, 120G: means the total test size
4BRD_12G: BRD is the ramdisk, where '4' means 4 ramdisk, and where '12G' means
the size of one ramdisk. So, it would be 48G in total. And we made a
raid on those ramdisk
The change is a bit interesting as you already stated it clear that this
patch is for performance gain. The patch itself is clear, too: remove a
mutex lock. So the only reasonable cause, without too much dig, I can think
of would be the remove of this lock reduces sleep time, and brings more
process to be able run, but somehow increases the context switches and cpu
usage in the meantime at somewhere. I guess this is what the following
changes are trying to tell us:
29708 ? 2% +5720.0% 1729051 ? 1% fsmark.time.voluntary_context_switches
302 ? 0% +113.8% 647 ? 0% fsmark.time.percent_of_cpu_this_job_got
61.05 ? 0% +214.0% 191.70 ? 0% fsmark.time.system_time
FYI, Here I listed all changes for the outstanding change:
3 0.3 |490.800| ?5.7 3 0.5 |262.067| ?0.4 -46.6% ivb44/fsmark/1x-64t-4BRD_12G-RAID0-f2fs-4M-30G-fsyncBeforeClose
3402e87cfb5e762f 78373b7319abdf15050af5b163
---------------- --------------------------
%stddev %change %stddev
\ | \
29708 ? 2% +5720.0% 1729051 ? 1% fsmark.time.voluntary_context_switches
61.05 ? 0% +214.0% 191.70 ? 0% fsmark.time.system_time
302 ? 0% +113.8% 647 ? 0% fsmark.time.percent_of_cpu_this_job_got
10476 ? 0% +95.4% 20467 ? 5% fsmark.time.minor_page_faults
490 ? 5% -46.6% 262 ? 0% fsmark.files_per_sec
20.21 ? 0% +46.7% 29.65 ? 0% fsmark.time.elapsed_time
20.21 ? 0% +46.7% 29.65 ? 0% fsmark.time.elapsed_time.max
226379 ? 0% +32.5% 299882 ? 0% fsmark.app_overhead
0 ? 0% +Inf% 1045 ? 2% proc-vmstat.numa_pages_migrated
209 ? 26% +3272.3% 7059 ? 3% cpuidle.C1E-IVT.usage
228 ? 42% +686.7% 1799 ? 14% numa-meminfo.node0.Writeback
14633 ? 5% +7573.2% 1122849 ? 1% cpuidle.C1-IVT.usage
0 ? 0% +Inf% 1045 ? 2% proc-vmstat.pgmigrate_success
29708 ? 2% +5720.0% 1729051 ? 1% time.voluntary_context_switches
55663 ? 0% +776.9% 488081 ? 0% cpuidle.C6-IVT.usage
56 ? 42% +718.8% 464 ? 11% numa-vmstat.node0.nr_writeback
535 ? 29% +334.4% 2325 ? 10% meminfo.Writeback
129 ? 30% +295.6% 511 ? 4% proc-vmstat.nr_writeback
59.25 ? 5% -74.2% 15.26 ? 3% turbostat.CPU%c6
2.58 ? 8% -74.5% 0.66 ? 11% turbostat.Pkg%pc2
1.551e+08 ? 14% +233.4% 5.171e+08 ? 4% cpuidle.C1-IVT.time
32564 ? 24% +208.1% 100330 ? 5% softirqs.RCU
61.05 ? 0% +214.0% 191.70 ? 0% time.system_time
60 ? 32% +165.7% 160 ? 16% numa-vmstat.node1.nr_writeback
2 ? 0% +200.0% 6 ? 0% vmstat.procs.r
3057 ? 2% +166.1% 8136 ? 22% numa-vmstat.node0.nr_mapped
12240 ? 2% +165.9% 32547 ? 22% numa-meminfo.node0.Mapped
6324 ? 3% +148.4% 15709 ? 0% proc-vmstat.nr_mapped
25318 ? 3% +148.6% 62931 ? 0% meminfo.Mapped
302 ? 0% +113.8% 647 ? 0% time.percent_of_cpu_this_job_got
34.39 ? 8% +102.9% 69.79 ? 0% turbostat.CPU%c1
3247 ? 32% +124.9% 7303 ? 14% numa-vmstat.node0.nr_active_anon
12994 ? 32% +124.8% 29215 ? 14% numa-meminfo.node0.Active(anon)
3256 ? 4% +132.5% 7569 ? 23% numa-vmstat.node1.nr_mapped
3252 ? 32% +124.6% 7304 ? 14% numa-vmstat.node0.nr_anon_pages
13036 ? 4% +132.3% 30279 ? 23% numa-meminfo.node1.Mapped
13033 ? 31% +124.1% 29214 ? 14% numa-meminfo.node0.AnonPages
10476 ? 0% +95.4% 20467 ? 5% time.minor_page_faults
121752 ? 7% -35.8% 78122 ? 2% numa-meminfo.node0.Dirty
30381 ? 7% -36.0% 19456 ? 2% numa-vmstat.node0.nr_dirty
113996 ? 4% +64.4% 187416 ? 0% proc-vmstat.pgfault
59301 ? 1% +60.9% 95414 ? 1% softirqs.SCHED
1.92 ? 1% -31.9% 1.31 ? 4% turbostat.Pkg%pc6
16 ? 35% +54.2% 24 ? 10% cpuidle.POLL.usage
20.21 ? 0% +46.7% 29.65 ? 0% time.elapsed_time.max
20.21 ? 0% +46.7% 29.65 ? 0% time.elapsed_time
7114 ? 2% +46.3% 10405 ? 0% proc-vmstat.nr_active_anon
28499 ? 2% +46.2% 41659 ? 0% meminfo.Active(anon)
28617 ? 2% +45.7% 41692 ? 0% meminfo.AnonPages
7138 ? 2% +45.7% 10403 ? 0% proc-vmstat.nr_anon_pages
95342 ? 2% +44.9% 138177 ? 6% numa-meminfo.node0.Active
82347 ? 2% +32.3% 108962 ? 9% numa-meminfo.node0.Active(file)
20586 ? 2% +32.3% 27240 ? 9% numa-vmstat.node0.nr_active_file
233023 ? 3% -30.7% 161486 ? 0% meminfo.Dirty
0.19 ? 2% +36.2% 0.26 ? 3% time.user_time
57900 ? 3% -30.5% 40219 ? 0% proc-vmstat.nr_dirty
108006 ? 8% -24.6% 81460 ? 2% numa-meminfo.node1.Dirty
26896 ? 8% -24.4% 20331 ? 2% numa-vmstat.node1.nr_dirty
219977 ? 0% +32.9% 292456 ? 1% softirqs.TIMER
197382 ? 1% +32.6% 261733 ? 0% meminfo.Active
42210 ? 1% +30.2% 54947 ? 0% proc-vmstat.nr_active_file
168883 ? 1% +30.3% 220073 ? 0% meminfo.Active(file)
353 ? 4% +21.7% 430 ? 3% numa-vmstat.node0.nr_page_table_pages
1425 ? 4% +20.2% 1714 ? 4% numa-meminfo.node0.PageTables
86494 ? 2% +28.0% 110745 ? 10% numa-meminfo.node1.Active(file)
21621 ? 2% +28.0% 27683 ? 10% numa-vmstat.node1.nr_active_file
46.00 ? 4% +20.8% 55.56 ? 3% uptime.boot
101939 ? 4% +20.8% 123117 ? 8% numa-meminfo.node1.Active
1412 ? 9% +10.8% 1565 ? 8% numa-vmstat.node0.nr_alloc_batch
90644 ? 6% +22.4% 110984 ? 0% proc-vmstat.pgfree
2084 ? 4% +15.7% 2412 ? 3% uptime.idle
18666 ? 1% +9.2% 20390 ? 4% slabinfo.anon_vma.num_objs
18663 ? 1% +9.3% 20390 ? 4% slabinfo.anon_vma.active_objs
435.51 ? 0% -89.9% 43.95 ? 0% iostat.md0.avgrq-sz
6415 ? 0% +875.4% 62579 ? 1% iostat.md0.w/s
5595 ? 1% +1883.4% 110979 ? 1% vmstat.system.cs
280678 ? 0% +315.2% 1165516 ? 0% iostat.md0.avgqu-sz
6.31 ? 0% +123.9% 14.14 ? 0% turbostat.%Busy
195 ? 0% +116.4% 423 ? 0% turbostat.Avg_MHz
26035 ? 0% +76.8% 46037 ? 0% vmstat.system.in
1370424 ? 0% -28.1% 985009 ? 0% vmstat.io.bo
1459306 ? 0% -28.1% 1048622 ? 0% iostat.md0.wkB/s
72.28 ? 0% +29.6% 93.70 ? 0% turbostat.CorWatt
102 ? 0% +21.4% 124 ? 0% turbostat.PkgWatt
6.66 ? 2% +8.1% 7.19 ? 0% turbostat.RAMWatt
Thank you for the report.
Interesting to me as well.
Maybe, I suspect f2fs_submit_merged_bio(sbi, DATA, WRITE); in
f2fs_write_data_pages was called too many from multiple threads
at the same time, which results in bio splits.
Not sure, so, at this moment, I'll revert this patch. Once finishing
analysis on this issue, I'd be better repost the patch later.
Thank you,
On Tue, Apr 21, 2015 at 12:34:08PM +0800, Yuanhan Liu wrote:
> FYI, we found changes on `fsmark.files_per_sec' by 78373b7319abdf15050af5b1632c4c8b8b398f33:
>
> > commit 78373b7319abdf15050af5b1632c4c8b8b398f33
> > Author: Jaegeuk Kim <[email protected]>
> > AuthorDate: Fri Mar 13 21:44:36 2015 -0700
> > Commit: Jaegeuk Kim <[email protected]>
> > CommitDate: Fri Apr 10 15:08:45 2015 -0700
> >
> > f2fs: enhance multi-threads performance
>
> 3402e87cfb5e762f9c95071bf4a2ad65fd9392a2 78373b7319abdf15050af5b1632c4c8b8b398f33
> ---------------------------------------- ----------------------------------------
> run time(m) metric_value ?stddev run time(m) metric_value ?stddev change testbox/benchmark/testcase-params
> --- ------ ---------------------------- --- ------ ---------------------------- -------- ------------------------------
> 3 0.3 |490.800| ?5.7 3 0.5 |262.067| ?0.4 -46.6% ivb44/fsmark/1x-64t-4BRD_12G-RAID0-f2fs-4M-30G-fsyncBeforeClose
> 3 0.3 |468.367| ?3.5 3 0.5 |264.467| ?0.2 -43.5% ivb44/fsmark/1x-64t-9BRD_6G-RAID0-f2fs-4M-30G-fsyncBeforeClose
> 3 0.6 |211.867| ?0.7 3 0.7 |191.067| ?0.5 -9.8% ivb44/fsmark/1x-64t-4BRD_12G-RAID5-f2fs-4M-30G-fsyncBeforeClose
>
> NOTE: here are some more info about those test parameters for you to
> know what the testcase does better:
>
> 1x: where 'x' means iterations or loop, corresponding to the 'L' option of fsmark
>
> 1t, 64t: where 't' means thread
>
> 4M: means the single file size, corresponding to the '-s' option of fsmark
> 40G, 30G, 120G: means the total test size
>
> 4BRD_12G: BRD is the ramdisk, where '4' means 4 ramdisk, and where '12G' means
> the size of one ramdisk. So, it would be 48G in total. And we made a
> raid on those ramdisk
>
>
> The change is a bit interesting as you already stated it clear that this
> patch is for performance gain. The patch itself is clear, too: remove a
> mutex lock. So the only reasonable cause, without too much dig, I can think
> of would be the remove of this lock reduces sleep time, and brings more
> process to be able run, but somehow increases the context switches and cpu
> usage in the meantime at somewhere. I guess this is what the following
> changes are trying to tell us:
>
> 29708 ? 2% +5720.0% 1729051 ? 1% fsmark.time.voluntary_context_switches
> 302 ? 0% +113.8% 647 ? 0% fsmark.time.percent_of_cpu_this_job_got
> 61.05 ? 0% +214.0% 191.70 ? 0% fsmark.time.system_time
>
>
> FYI, Here I listed all changes for the outstanding change:
>
> 3 0.3 |490.800| ?5.7 3 0.5 |262.067| ?0.4 -46.6% ivb44/fsmark/1x-64t-4BRD_12G-RAID0-f2fs-4M-30G-fsyncBeforeClose
>
> 3402e87cfb5e762f 78373b7319abdf15050af5b163
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 29708 ? 2% +5720.0% 1729051 ? 1% fsmark.time.voluntary_context_switches
> 61.05 ? 0% +214.0% 191.70 ? 0% fsmark.time.system_time
> 302 ? 0% +113.8% 647 ? 0% fsmark.time.percent_of_cpu_this_job_got
> 10476 ? 0% +95.4% 20467 ? 5% fsmark.time.minor_page_faults
> 490 ? 5% -46.6% 262 ? 0% fsmark.files_per_sec
> 20.21 ? 0% +46.7% 29.65 ? 0% fsmark.time.elapsed_time
> 20.21 ? 0% +46.7% 29.65 ? 0% fsmark.time.elapsed_time.max
> 226379 ? 0% +32.5% 299882 ? 0% fsmark.app_overhead
> 0 ? 0% +Inf% 1045 ? 2% proc-vmstat.numa_pages_migrated
> 209 ? 26% +3272.3% 7059 ? 3% cpuidle.C1E-IVT.usage
> 228 ? 42% +686.7% 1799 ? 14% numa-meminfo.node0.Writeback
> 14633 ? 5% +7573.2% 1122849 ? 1% cpuidle.C1-IVT.usage
> 0 ? 0% +Inf% 1045 ? 2% proc-vmstat.pgmigrate_success
> 29708 ? 2% +5720.0% 1729051 ? 1% time.voluntary_context_switches
> 55663 ? 0% +776.9% 488081 ? 0% cpuidle.C6-IVT.usage
> 56 ? 42% +718.8% 464 ? 11% numa-vmstat.node0.nr_writeback
> 535 ? 29% +334.4% 2325 ? 10% meminfo.Writeback
> 129 ? 30% +295.6% 511 ? 4% proc-vmstat.nr_writeback
> 59.25 ? 5% -74.2% 15.26 ? 3% turbostat.CPU%c6
> 2.58 ? 8% -74.5% 0.66 ? 11% turbostat.Pkg%pc2
> 1.551e+08 ? 14% +233.4% 5.171e+08 ? 4% cpuidle.C1-IVT.time
> 32564 ? 24% +208.1% 100330 ? 5% softirqs.RCU
> 61.05 ? 0% +214.0% 191.70 ? 0% time.system_time
> 60 ? 32% +165.7% 160 ? 16% numa-vmstat.node1.nr_writeback
> 2 ? 0% +200.0% 6 ? 0% vmstat.procs.r
> 3057 ? 2% +166.1% 8136 ? 22% numa-vmstat.node0.nr_mapped
> 12240 ? 2% +165.9% 32547 ? 22% numa-meminfo.node0.Mapped
> 6324 ? 3% +148.4% 15709 ? 0% proc-vmstat.nr_mapped
> 25318 ? 3% +148.6% 62931 ? 0% meminfo.Mapped
> 302 ? 0% +113.8% 647 ? 0% time.percent_of_cpu_this_job_got
> 34.39 ? 8% +102.9% 69.79 ? 0% turbostat.CPU%c1
> 3247 ? 32% +124.9% 7303 ? 14% numa-vmstat.node0.nr_active_anon
> 12994 ? 32% +124.8% 29215 ? 14% numa-meminfo.node0.Active(anon)
> 3256 ? 4% +132.5% 7569 ? 23% numa-vmstat.node1.nr_mapped
> 3252 ? 32% +124.6% 7304 ? 14% numa-vmstat.node0.nr_anon_pages
> 13036 ? 4% +132.3% 30279 ? 23% numa-meminfo.node1.Mapped
> 13033 ? 31% +124.1% 29214 ? 14% numa-meminfo.node0.AnonPages
> 10476 ? 0% +95.4% 20467 ? 5% time.minor_page_faults
> 121752 ? 7% -35.8% 78122 ? 2% numa-meminfo.node0.Dirty
> 30381 ? 7% -36.0% 19456 ? 2% numa-vmstat.node0.nr_dirty
> 113996 ? 4% +64.4% 187416 ? 0% proc-vmstat.pgfault
> 59301 ? 1% +60.9% 95414 ? 1% softirqs.SCHED
> 1.92 ? 1% -31.9% 1.31 ? 4% turbostat.Pkg%pc6
> 16 ? 35% +54.2% 24 ? 10% cpuidle.POLL.usage
> 20.21 ? 0% +46.7% 29.65 ? 0% time.elapsed_time.max
> 20.21 ? 0% +46.7% 29.65 ? 0% time.elapsed_time
> 7114 ? 2% +46.3% 10405 ? 0% proc-vmstat.nr_active_anon
> 28499 ? 2% +46.2% 41659 ? 0% meminfo.Active(anon)
> 28617 ? 2% +45.7% 41692 ? 0% meminfo.AnonPages
> 7138 ? 2% +45.7% 10403 ? 0% proc-vmstat.nr_anon_pages
> 95342 ? 2% +44.9% 138177 ? 6% numa-meminfo.node0.Active
> 82347 ? 2% +32.3% 108962 ? 9% numa-meminfo.node0.Active(file)
> 20586 ? 2% +32.3% 27240 ? 9% numa-vmstat.node0.nr_active_file
> 233023 ? 3% -30.7% 161486 ? 0% meminfo.Dirty
> 0.19 ? 2% +36.2% 0.26 ? 3% time.user_time
> 57900 ? 3% -30.5% 40219 ? 0% proc-vmstat.nr_dirty
> 108006 ? 8% -24.6% 81460 ? 2% numa-meminfo.node1.Dirty
> 26896 ? 8% -24.4% 20331 ? 2% numa-vmstat.node1.nr_dirty
> 219977 ? 0% +32.9% 292456 ? 1% softirqs.TIMER
> 197382 ? 1% +32.6% 261733 ? 0% meminfo.Active
> 42210 ? 1% +30.2% 54947 ? 0% proc-vmstat.nr_active_file
> 168883 ? 1% +30.3% 220073 ? 0% meminfo.Active(file)
> 353 ? 4% +21.7% 430 ? 3% numa-vmstat.node0.nr_page_table_pages
> 1425 ? 4% +20.2% 1714 ? 4% numa-meminfo.node0.PageTables
> 86494 ? 2% +28.0% 110745 ? 10% numa-meminfo.node1.Active(file)
> 21621 ? 2% +28.0% 27683 ? 10% numa-vmstat.node1.nr_active_file
> 46.00 ? 4% +20.8% 55.56 ? 3% uptime.boot
> 101939 ? 4% +20.8% 123117 ? 8% numa-meminfo.node1.Active
> 1412 ? 9% +10.8% 1565 ? 8% numa-vmstat.node0.nr_alloc_batch
> 90644 ? 6% +22.4% 110984 ? 0% proc-vmstat.pgfree
> 2084 ? 4% +15.7% 2412 ? 3% uptime.idle
> 18666 ? 1% +9.2% 20390 ? 4% slabinfo.anon_vma.num_objs
> 18663 ? 1% +9.3% 20390 ? 4% slabinfo.anon_vma.active_objs
> 435.51 ? 0% -89.9% 43.95 ? 0% iostat.md0.avgrq-sz
> 6415 ? 0% +875.4% 62579 ? 1% iostat.md0.w/s
> 5595 ? 1% +1883.4% 110979 ? 1% vmstat.system.cs
> 280678 ? 0% +315.2% 1165516 ? 0% iostat.md0.avgqu-sz
> 6.31 ? 0% +123.9% 14.14 ? 0% turbostat.%Busy
> 195 ? 0% +116.4% 423 ? 0% turbostat.Avg_MHz
> 26035 ? 0% +76.8% 46037 ? 0% vmstat.system.in
> 1370424 ? 0% -28.1% 985009 ? 0% vmstat.io.bo
> 1459306 ? 0% -28.1% 1048622 ? 0% iostat.md0.wkB/s
> 72.28 ? 0% +29.6% 93.70 ? 0% turbostat.CorWatt
> 102 ? 0% +21.4% 124 ? 0% turbostat.PkgWatt
> 6.66 ? 2% +8.1% 7.19 ? 0% turbostat.RAMWatt