2020-04-30 11:02:48

by Sayali Lokhande

[permalink] [raw]
Subject: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);

- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <[email protected]>
---
fs/f2fs/checkpoint.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 5ba649e..97b6378 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1219,21 +1219,19 @@ static int block_operations(struct f2fs_sb_info *sbi)
goto retry_flush_quotas;
}

-retry_flush_nodes:
down_write(&sbi->node_write);

if (get_pages(sbi, F2FS_DIRTY_NODES)) {
up_write(&sbi->node_write);
+ up_write(&sbi->node_change);
+ f2fs_unlock_all(sbi);
atomic_inc(&sbi->wb_sync_req[NODE]);
err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
atomic_dec(&sbi->wb_sync_req[NODE]);
- if (err) {
- up_write(&sbi->node_change);
- f2fs_unlock_all(sbi);
+ if (err)
goto out;
- }
cond_resched();
- goto retry_flush_nodes;
+ goto retry_flush_quotas;
}

/*
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


2020-05-06 06:24:07

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 2020/4/30 18:58, Sayali Lokhande wrote:
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
>
> Call stack :
>
> Thread A Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
> - f2fs_lock_all(sbi);
> - down_write(&sbi->cp_rwsem);
>
> - open()
> - igrab()
> - write() write inline data
> - unlink()
> - f2fs_sync_node_pages()
> - if (is_inline_node(page))
> - flush_inline_data()
> - ilookup()
> page = f2fs_pagecache_get_page()
> if (!page)
> goto iput_out;
> iput_out:
> -close()
> -iput()
> iput(inode);
> - f2fs_evict_inode()
> - f2fs_truncate_blocks()
> - f2fs_lock_op()
> - down_read(&sbi->cp_rwsem);
>
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <[email protected]>

Reviewed-by: Chao Yu <[email protected]>

Thanks,

2020-05-08 10:21:58

by Chen, Rong A

[permalink] [raw]
Subject: [f2fs] da55a3ad36: filebench.sum_bytes_mb/s -44.4% regression

Greeting,

FYI, we noticed a -44.4% regression of filebench.sum_bytes_mb/s due to commit:


commit: da55a3ad3669f17a05e00ec819901293c9cc7bf3 ("[PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint")
url: https://github.com/0day-ci/linux/commits/Sayali-Lokhande/f2fs-Avoid-double-lock-for-cp_rwsem-during-checkpoint/20200501-024326
base: https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test

in testcase: filebench
on test machine: 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G memory
with following parameters:

disk: 1HDD
fs: f2fs
test: createfiles.f
cpufreq_governor: performance
ucode: 0x27




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/1HDD/f2fs/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/lkp-hsw-d01/createfiles.f/filebench/0x27

commit:
dcd94277cf ("f2fs: add compressed/gc data read IO stat")
da55a3ad36 ("f2fs: Avoid double lock for cp_rwsem during checkpoint")

dcd94277cf7b4778 da55a3ad3669f17a05e00ec8199
---------------- ---------------------------
%stddev %change %stddev
\ | \
390.50 -44.4% 216.93 ± 28% filebench.sum_bytes_mb/s
74982 -44.5% 41652 ± 28% filebench.sum_operations/s
0.19 ± 3% +55.6% 0.29 ± 29% filebench.sum_time_ms/op
24992 -44.4% 13884 ± 28% filebench.sum_writes/s
11.21 ± 2% +24.5% 13.96 ± 3% filebench.time.elapsed_time
11.21 ± 2% +24.5% 13.96 ± 3% filebench.time.elapsed_time.max
2599052 +8.1% 2808701 filebench.time.file_system_outputs
34.50 -16.9% 28.67 ± 3% filebench.time.percent_of_cpu_this_job_got
19623 ± 3% +14.0% 22364 filebench.time.voluntary_context_switches
12.67 +2.8% 13.02 boot-time.dhcp
8.25 +3.3 11.54 mpstat.cpu.all.iowait%
4.25 ± 4% -0.5 3.73 ± 4% mpstat.cpu.all.sys%
3.84 ± 2% -0.6 3.25 ± 3% mpstat.cpu.all.usr%
10853 ± 3% +21.6% 13198 ± 6% softirqs.CPU0.TIMER
9855 +21.3% 11958 ± 8% softirqs.CPU4.TIMER
9749 ± 3% +36.2% 13275 ± 21% softirqs.CPU6.TIMER
30472 +11.1% 33865 softirqs.SCHED
957129 ± 18% -33.2% 639243 ± 28% cpuidle.C1.time
39341259 ± 22% -66.3% 13274567 ± 93% cpuidle.C6.time
45789 ± 18% -60.4% 18150 ± 87% cpuidle.C6.usage
32265 ± 19% -40.2% 19301 ± 12% cpuidle.POLL.time
8071 ± 12% -15.2% 6845 ± 11% cpuidle.POLL.usage
84.00 -2.4% 82.00 vmstat.cpu.id
7.00 +42.9% 10.00 vmstat.cpu.wa
14528 -15.9% 12222 ± 2% vmstat.io.bi
20968 +73.5% 36378 ± 6% vmstat.io.bo
1387456 +17.2% 1626013 vmstat.memory.cache
4842 +15.5% 5594 ± 2% interrupts.19:IR-IO-APIC.19-fasteoi.ata_piix
166.00 ± 74% +183.1% 470.00 ± 42% interrupts.28:IR-PCI-MSI.409600-edge.eth0
27387 ± 5% +178.5% 76272 ± 17% interrupts.CPU2.LOC:Local_timer_interrupts
30881 ± 20% +63.4% 50460 ± 25% interrupts.CPU3.LOC:Local_timer_interrupts
166.00 ± 74% +183.1% 470.00 ± 42% interrupts.CPU4.28:IR-PCI-MSI.409600-edge.eth0
75.50 ± 17% +32.0% 99.67 ± 3% interrupts.CPU4.CAL:Function_call_interrupts
175.50 -19.5% 141.33 ± 17% interrupts.CPU4.RES:Rescheduling_interrupts
116.00 ± 17% +53.7% 178.33 ± 4% interrupts.CPU6.RES:Rescheduling_interrupts
82.00 ± 6% +33.7% 109.67 ± 4% interrupts.CPU7.RES:Rescheduling_interrupts
373139 ± 2% +20.3% 448922 ± 4% interrupts.LOC:Local_timer_interrupts
84.66 -2.7% 82.38 iostat.cpu.idle
7.19 +43.5% 10.33 ± 3% iostat.cpu.iowait
4.33 -8.0% 3.98 ± 5% iostat.cpu.system
3.81 -13.2% 3.31 ± 3% iostat.cpu.user
1.25 +14.0% 1.42 ± 3% iostat.sda.avgqu-sz
17.43 +1308.1% 245.35 ±131% iostat.sda.await.max
313.22 -14.1% 269.02 ± 4% iostat.sda.r/s
14565 -15.9% 12254 ± 2% iostat.sda.rkB/s
3322 -16.0% 2789 ± 2% iostat.sda.rrqm/s
3.09 ± 3% +467.6% 17.52 ±109% iostat.sda.svctm
8.71 +2685.7% 242.77 ±133% iostat.sda.svctm.max
69.97 ± 2% +7.7% 75.33 ± 3% iostat.sda.util
35.44 ± 2% +94.3% 68.86 ± 7% iostat.sda.w/s
17.43 +1308.1% 245.35 ±131% iostat.sda.w_await.max
22983 +66.0% 38143 ± 6% iostat.sda.wkB/s
319440 +25.3% 400307 ± 2% meminfo.Active
280118 ± 2% +12.6% 315548 meminfo.Active(anon)
39321 ± 4% +115.6% 84758 ± 6% meminfo.Active(file)
66008 +64.7% 108726 ± 3% meminfo.AnonHugePages
279917 ± 2% +12.7% 315339 meminfo.AnonPages
1339189 +16.9% 1565959 meminfo.Cached
161579 ± 2% +98.1% 320046 ± 8% meminfo.Dirty
201499 ± 3% +90.1% 383044 ± 6% meminfo.Inactive
155133 ± 4% +115.6% 334509 ± 6% meminfo.Inactive(file)
63886 +36.5% 87191 ± 2% meminfo.KReclaimable
1819283 +15.7% 2104828 meminfo.Memused
63886 +36.5% 87191 ± 2% meminfo.SReclaimable
98391 +23.8% 121806 ± 2% meminfo.Slab
502.00 ± 13% +84.8% 927.67 ± 10% meminfo.Writeback
219588 -16.8% 182704 ± 3% meminfo.max_used_kB
70801 +11.4% 78906 proc-vmstat.nr_active_anon
9773 ± 3% +116.4% 21146 ± 7% proc-vmstat.nr_active_file
70754 +11.4% 78854 proc-vmstat.nr_anon_pages
324216 ± 3% +14.1% 369827 ± 4% proc-vmstat.nr_dirtied
40081 +99.1% 79787 ± 9% proc-vmstat.nr_dirty
334411 +17.0% 391272 proc-vmstat.nr_file_pages
1558377 -4.5% 1487609 proc-vmstat.nr_free_pages
11461 +5.7% 12118 ± 2% proc-vmstat.nr_inactive_anon
38579 ± 3% +116.3% 83463 ± 7% proc-vmstat.nr_inactive_file
3993 +1.6% 4055 proc-vmstat.nr_kernel_stack
14932 +4.4% 15587 ± 2% proc-vmstat.nr_mapped
1043 +4.0% 1085 proc-vmstat.nr_page_table_pages
11558 +5.8% 12228 ± 2% proc-vmstat.nr_shmem
15979 +36.3% 21783 ± 3% proc-vmstat.nr_slab_reclaimable
134.50 ± 5% +78.7% 240.33 ± 3% proc-vmstat.nr_writeback
71015 ± 2% +106.0% 146295 ± 3% proc-vmstat.nr_written
70801 +11.5% 78912 proc-vmstat.nr_zone_active_anon
9771 ± 3% +116.4% 21145 ± 7% proc-vmstat.nr_zone_active_file
11460 +5.7% 12117 ± 2% proc-vmstat.nr_zone_inactive_anon
38572 ± 3% +116.4% 83466 ± 7% proc-vmstat.nr_zone_inactive_file
40223 +99.0% 80023 ± 9% proc-vmstat.nr_zone_write_pending
285084 ± 2% +105.4% 585492 ± 3% proc-vmstat.pgpgout
6835 ± 2% -11.2% 6072 ± 9% slabinfo.anon_vma_chain.num_objs
51857 +21.8% 63172 ± 2% slabinfo.dentry.active_objs
2509 +20.9% 3035 ± 2% slabinfo.dentry.active_slabs
52713 +20.9% 63747 ± 2% slabinfo.dentry.num_objs
2509 +20.9% 3035 ± 2% slabinfo.dentry.num_slabs
1385 ± 9% +123.0% 3090 ± 12% slabinfo.f2fs_extent_node.active_objs
1562 ± 6% +102.7% 3167 ± 12% slabinfo.f2fs_extent_node.num_objs
11274 ± 8% +91.6% 21599 ± 6% slabinfo.f2fs_extent_tree.active_objs
224.00 ± 8% +90.2% 426.00 ± 6% slabinfo.f2fs_extent_tree.active_slabs
11471 ± 8% +89.6% 21747 ± 6% slabinfo.f2fs_extent_tree.num_objs
224.00 ± 8% +90.2% 426.00 ± 6% slabinfo.f2fs_extent_tree.num_slabs
10359 ± 3% +110.1% 21760 ± 6% slabinfo.f2fs_inode_cache.active_objs
372.50 ± 3% +109.0% 778.67 ± 6% slabinfo.f2fs_inode_cache.active_slabs
10437 ± 3% +109.0% 21817 ± 6% slabinfo.f2fs_inode_cache.num_objs
372.50 ± 3% +109.0% 778.67 ± 6% slabinfo.f2fs_inode_cache.num_slabs
43654 ± 4% +40.1% 61180 slabinfo.f2fs_nat_entry.active_objs
341.50 ± 4% +40.5% 479.67 slabinfo.f2fs_nat_entry.active_slabs
43751 ± 4% +40.5% 61453 slabinfo.f2fs_nat_entry.num_objs
341.50 ± 4% +40.5% 479.67 slabinfo.f2fs_nat_entry.num_slabs
7680 ± 6% -8.9% 6997 ± 6% slabinfo.kmalloc-8.active_objs
7680 ± 6% -8.9% 6997 ± 6% slabinfo.kmalloc-8.num_objs
22811 +54.6% 35261 ± 3% slabinfo.radix_tree_node.active_objs
855.00 +49.5% 1278 ± 3% slabinfo.radix_tree_node.active_slabs
23949 +49.5% 35802 ± 3% slabinfo.radix_tree_node.num_objs
855.00 +49.5% 1278 ± 3% slabinfo.radix_tree_node.num_slabs
2.53 ±100% +137.9% 6.03 ± 11% sched_debug.cfs_rq:/.exec_clock.avg
131763 ± 2% -30.7% 91313 ± 11% sched_debug.cfs_rq:/.load.avg
43314 ± 20% -100.0% 0.00 sched_debug.cfs_rq:/.load.min
58283 ± 18% +45.6% 84867 ± 7% sched_debug.cfs_rq:/.load.stddev
533.31 -20.8% 422.50 ± 7% sched_debug.cfs_rq:/.load_avg.avg
6147 -20.9% 4861 ± 11% sched_debug.cfs_rq:/.min_vruntime.min
1.00 -29.2% 0.71 ± 8% sched_debug.cfs_rq:/.nr_running.avg
1.00 -100.0% 0.00 sched_debug.cfs_rq:/.nr_running.min
128.00 -100.0% 0.00 sched_debug.cfs_rq:/.removed.load_avg.avg
1024 -100.0% 0.00 sched_debug.cfs_rq:/.removed.load_avg.max
338.66 -100.0% 0.00 sched_debug.cfs_rq:/.removed.load_avg.stddev
128.00 -100.0% 0.00 sched_debug.cfs_rq:/.removed.runnable_avg.avg
1024 -100.0% 0.00 sched_debug.cfs_rq:/.removed.runnable_avg.max
338.66 -100.0% 0.00 sched_debug.cfs_rq:/.removed.runnable_avg.stddev
20.81 ± 92% -100.0% 0.00 sched_debug.cfs_rq:/.removed.util_avg.avg
166.50 ± 92% -100.0% 0.00 sched_debug.cfs_rq:/.removed.util_avg.max
55.06 ± 92% -100.0% 0.00 sched_debug.cfs_rq:/.removed.util_avg.stddev
113.25 ± 35% -61.3% 43.88 ± 70% sched_debug.cfs_rq:/.util_est_enqueued.avg
470.50 ± 8% -42.3% 271.67 ± 68% sched_debug.cfs_rq:/.util_est_enqueued.max
1.50 ± 33% -100.0% 0.00 sched_debug.cfs_rq:/.util_est_enqueued.min
170.87 ± 16% -46.2% 91.94 ± 68% sched_debug.cfs_rq:/.util_est_enqueued.stddev
821833 ± 21% -18.7% 668072 ± 11% sched_debug.cpu.avg_idle.max
271414 ± 14% -12.8% 236616 ± 10% sched_debug.cpu.avg_idle.stddev
0.96 ± 42% -62.0% 0.37 ± 55% sched_debug.cpu.clock.stddev
0.96 ± 42% -62.0% 0.37 ± 55% sched_debug.cpu.clock_task.stddev
1106 ± 13% -22.8% 853.96 ± 7% sched_debug.cpu.curr->pid.avg
240.88 ± 56% +127.4% 547.88 ± 5% sched_debug.cpu.curr->pid.stddev
0.00 ± 2% -28.8% 0.00 ± 15% sched_debug.cpu.next_balance.stddev
1.56 ± 28% -54.7% 0.71 ± 8% sched_debug.cpu.nr_running.avg
3.00 ± 33% -66.7% 1.00 sched_debug.cpu.nr_running.max
3552 ± 13% +41.3% 5019 ± 18% sched_debug.cpu.nr_switches.min



filebench.sum_operations_s

80000 +-------------------------------------------------------------------+
|+.++++.+++.+++ +++ ++.++ ++ +++ ++ +.++ ++++.+++.+ + +|
70000 |-+ : : : : : :: : : :: : : : : : :|
60000 |-+ : : : : : :: : : :: : : : : : :|
| : : : : : :: : : :: : : : : : :|
50000 |O+ OO OO O :O: OO: O: :O: : : : : : : : : : : : |
| : : : : : : : : : : : : : : : : : |
40000 |-+ : : : : : : : : : : : : : : : : : |
| :: : : :: : : :: : : :: : :: |
30000 |-+ :: : : :: : : :: : : :: : :: |
20000 |-+ :: : : O :: O: : :: : : :: : :: |
| O : :: : : : : : : : : : |
10000 |-+ : :: : : : : : : : : : |
| : :: : : : : : : : : : |
0 +-------------------------------------------------------------------+


filebench.sum_writes_s

25000 +-------------------------------------------------------------------+
| : : : : : :: : : :: : : : : : :|
| : : : : : :: : : :: : : : : : :|
20000 |-+ : : : : : :: : : :: : : : : : :|
| : : : : : : : : : : : : : : : : : |
|O OO OO O :O: OO: O: :O: : : : : : : : : : : : |
15000 |-+ : : : : : : : : : : : : : : : : : |
| :: : : :: : : : : : : :: :::: |
10000 |-+ :: : : :: : : :: : : :: : :: |
| :: : : O :: O: : :: : : :: : :: |
| :: : : :: : : :: : : :: : :: |
5000 |-+ O : :: : : : : : : : : : |
| : :: : : : : : : : : : |
| : :: : : : : : : : : : |
0 +-------------------------------------------------------------------+


filebench.sum_bytes_mb_s

400 +---------------------------------------------------------------------+
| : : : : : : : : : : : : : : : : :|
350 |-+ : : : : : : : : : : : : : : : : :|
300 |-+ : : : : : : : : : : : : : : : : :|
| : : : : : : : : : : : : : : : : : |
250 |O+ OO O OO :O:O : O:O :O: : : : : : : : : : : : |
| : : : : : : : : : : : : : : : : : |
200 |-+ :: : : :: : : :: : : :: : ::: |
| : : : : : : : : : : :: :: |
150 |-+ : : : O : O: : : : : : :: :: |
100 |-+ : : : : : : : : : : :: :: |
| O : :: : : : : : : : : : |
50 |-+ : :: : : : : : : : : : |
| : :: : : : : : : : : : |
0 +---------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (17.72 kB)
config-5.6.0-11881-gda55a3ad3669f1 (209.83 kB)
job-script (7.80 kB)
job.yaml (5.45 kB)
reproduce (590.00 B)
Download all attachments

2020-05-08 16:13:21

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

Hi Sayali,

In order to address the perf regression, how about this?

From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Fri, 8 May 2020 09:08:37 -0700
Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);

- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/node.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 1db8cabf727ef..626d7daca09de 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
goto continue_unlock;
}

- /* flush inline_data */
- if (is_inline_node(page)) {
+ /* flush inline_data, if it's not sync path. */
+ if (do_balance && is_inline_node(page)) {
clear_inline_node(page);
unlock_page(page);
flush_inline_data(sbi, ino_of_node(page));
--
2.26.2.645.ge9eca65c58-goog

2020-05-09 03:06:50

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 2020/5/9 0:10, Jaegeuk Kim wrote:
> Hi Sayali,
>
> In order to address the perf regression, how about this?
>
>>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Fri, 8 May 2020 09:08:37 -0700
> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
>
> Call stack :
>
> Thread A Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
> - f2fs_lock_all(sbi);
> - down_write(&sbi->cp_rwsem);
>
> - open()
> - igrab()
> - write() write inline data
> - unlink()
> - f2fs_sync_node_pages()
> - if (is_inline_node(page))
> - flush_inline_data()
> - ilookup()
> page = f2fs_pagecache_get_page()
> if (!page)
> goto iput_out;
> iput_out:
> -close()
> -iput()
> iput(inode);
> - f2fs_evict_inode()
> - f2fs_truncate_blocks()
> - f2fs_lock_op()
> - down_read(&sbi->cp_rwsem);
>
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/node.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..626d7daca09de 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> goto continue_unlock;
> }
>
> - /* flush inline_data */
> - if (is_inline_node(page)) {
> + /* flush inline_data, if it's not sync path. */
> + if (do_balance && is_inline_node(page)) {

IIRC, this flow was designed to avoid running out of free space issue
during checkpoint:

2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")

The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

I guess this may cause failing one case of fstest.


Since block_operations->f2fs_sync_inode_meta has synced inode cache to
inode page, so in block_operations->f2fs_sync_node_pages, could we
check nlink before flush_inline_data():

if (is_inline_node(page)) {
if (IS_INODE(page) && raw_inode_page->i_links) {
flush_inline_data()
}
}


> clear_inline_node(page);
> unlock_page(page);
> flush_inline_data(sbi, ino_of_node(page));
>

2020-05-09 19:07:40

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 05/09, Chao Yu wrote:
> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> > Hi Sayali,
> >
> > In order to address the perf regression, how about this?
> >
> >>From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <[email protected]>
> > Date: Fri, 8 May 2020 09:08:37 -0700
> > Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> >
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> >
> > Call stack :
> >
> > Thread A Thread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> > - f2fs_lock_all(sbi);
> > - down_write(&sbi->cp_rwsem);
> >
> > - open()
> > - igrab()
> > - write() write inline data
> > - unlink()
> > - f2fs_sync_node_pages()
> > - if (is_inline_node(page))
> > - flush_inline_data()
> > - ilookup()
> > page = f2fs_pagecache_get_page()
> > if (!page)
> > goto iput_out;
> > iput_out:
> > -close()
> > -iput()
> > iput(inode);
> > - f2fs_evict_inode()
> > - f2fs_truncate_blocks()
> > - f2fs_lock_op()
> > - down_read(&sbi->cp_rwsem);
> >
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> > Signed-off-by: Sayali Lokhande <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/node.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 1db8cabf727ef..626d7daca09de 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> > goto continue_unlock;
> > }
> >
> > - /* flush inline_data */
> > - if (is_inline_node(page)) {
> > + /* flush inline_data, if it's not sync path. */
> > + if (do_balance && is_inline_node(page)) {
>
> IIRC, this flow was designed to avoid running out of free space issue
> during checkpoint:
>
> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>
> The sceanrio is:
> 1. create fully node blocks
> 2. flush node blocks
> 3. write inline_data for all the node blocks again
> 4. flush node blocks redundantly
>
> I guess this may cause failing one case of fstest.

Yeah, actually I was hitting 204 failure, and thus, revised like this.
Now, I don't see any regression in fstest.

From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
From: Sayali Lokhande <[email protected]>
Date: Thu, 30 Apr 2020 16:28:29 +0530
Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);

- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 9 ++++++++-
fs/f2fs/f2fs.h | 4 ++--
fs/f2fs/node.c | 10 +++++-----
3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d49f7a01d8a26..928aea4ff663d 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
};
int err = 0, cnt = 0;

+ /*
+ * Let's flush node pages first to flush inline_data.
+ * We'll actually guarantee everything below under f2fs_lock_all.
+ */
+ f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
+
retry_flush_quotas:
f2fs_lock_all(sbi);
if (__need_flush_quota(sbi)) {
@@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
if (get_pages(sbi, F2FS_DIRTY_NODES)) {
up_write(&sbi->node_write);
atomic_inc(&sbi->wb_sync_req[NODE]);
- err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
+ err = f2fs_sync_node_pages(sbi, &wbc, false,
+ true, FS_CP_NODE_IO);
atomic_dec(&sbi->wb_sync_req[NODE]);
if (err) {
up_write(&sbi->node_change);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index d916540f12813..ac6ae42b9dd4e 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
struct writeback_control *wbc, bool atomic,
unsigned int *seq_id);
int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
- struct writeback_control *wbc,
- bool do_balance, enum iostat_type io_type);
+ struct writeback_control *wbc,
+ bool do_balance, bool sync, enum iostat_type io_type);
int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 1db8cabf727ef..fd00a8c119088 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page)
}

int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
- struct writeback_control *wbc,
- bool do_balance, enum iostat_type io_type)
+ struct writeback_control *wbc,
+ bool do_balance, bool sync, enum iostat_type io_type)
{
pgoff_t index;
struct pagevec pvec;
@@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
goto continue_unlock;
}

- /* flush inline_data */
- if (is_inline_node(page)) {
+ /* flush inline_data, if it's async context. */
+ if (!sync && is_inline_node(page)) {
clear_inline_node(page);
unlock_page(page);
flush_inline_data(sbi, ino_of_node(page));
@@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space *mapping,

diff = nr_pages_to_write(sbi, NODE, wbc);
blk_start_plug(&plug);
- f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO);
+ f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO);
blk_finish_plug(&plug);
wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff);

--
2.26.2.645.ge9eca65c58-goog

2020-05-11 01:31:20

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 2020/5/10 3:03, Jaegeuk Kim wrote:
> On 05/09, Chao Yu wrote:
>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
>>> Hi Sayali,
>>>
>>> In order to address the perf regression, how about this?
>>>
>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim <[email protected]>
>>> Date: Fri, 8 May 2020 09:08:37 -0700
>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>>>
>>> There could be a scenario where f2fs_sync_node_pages gets
>>> called during checkpoint, which in turn tries to flush
>>> inline data and calls iput(). This results in deadlock as
>>> iput() tries to hold cp_rwsem, which is already held at the
>>> beginning by checkpoint->block_operations().
>>>
>>> Call stack :
>>>
>>> Thread A Thread B
>>> f2fs_write_checkpoint()
>>> - block_operations(sbi)
>>> - f2fs_lock_all(sbi);
>>> - down_write(&sbi->cp_rwsem);
>>>
>>> - open()
>>> - igrab()
>>> - write() write inline data
>>> - unlink()
>>> - f2fs_sync_node_pages()
>>> - if (is_inline_node(page))
>>> - flush_inline_data()
>>> - ilookup()
>>> page = f2fs_pagecache_get_page()
>>> if (!page)
>>> goto iput_out;
>>> iput_out:
>>> -close()
>>> -iput()
>>> iput(inode);
>>> - f2fs_evict_inode()
>>> - f2fs_truncate_blocks()
>>> - f2fs_lock_op()
>>> - down_read(&sbi->cp_rwsem);
>>>
>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>> Signed-off-by: Sayali Lokhande <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/node.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>> index 1db8cabf727ef..626d7daca09de 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>> goto continue_unlock;
>>> }
>>>
>>> - /* flush inline_data */
>>> - if (is_inline_node(page)) {
>>> + /* flush inline_data, if it's not sync path. */
>>> + if (do_balance && is_inline_node(page)) {
>>
>> IIRC, this flow was designed to avoid running out of free space issue
>> during checkpoint:
>>
>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>
>> The sceanrio is:
>> 1. create fully node blocks
>> 2. flush node blocks
>> 3. write inline_data for all the node blocks again
>> 4. flush node blocks redundantly
>>
>> I guess this may cause failing one case of fstest.
>
> Yeah, actually I was hitting 204 failure, and thus, revised like this.
> Now, I don't see any regression in fstest.
>
>>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
> From: Sayali Lokhande <[email protected]>
> Date: Thu, 30 Apr 2020 16:28:29 +0530
> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
>
> Call stack :
>
> Thread A Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
> - f2fs_lock_all(sbi);
> - down_write(&sbi->cp_rwsem);
>
> - open()
> - igrab()
> - write() write inline data
> - unlink()
> - f2fs_sync_node_pages()
> - if (is_inline_node(page))
> - flush_inline_data()
> - ilookup()
> page = f2fs_pagecache_get_page()
> if (!page)
> goto iput_out;
> iput_out:
> -close()
> -iput()
> iput(inode);
> - f2fs_evict_inode()
> - f2fs_truncate_blocks()
> - f2fs_lock_op()
> - down_read(&sbi->cp_rwsem);
>
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 9 ++++++++-
> fs/f2fs/f2fs.h | 4 ++--
> fs/f2fs/node.c | 10 +++++-----
> 3 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d49f7a01d8a26..928aea4ff663d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
> };
> int err = 0, cnt = 0;
>
> + /*
> + * Let's flush node pages first to flush inline_data.
> + * We'll actually guarantee everything below under f2fs_lock_all.
> + */
> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);

It is possible that user write a large number of inline data in between
f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
race condition.

Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
flush inode page twice which is unneeded.

f2fs_sync_node_pages() --- flush dirty inode page
f2fs_lock_all()
...
f2fs_sync_inode_meta() --- update dirty inode page
f2fs_sync_node_pages() --- flush dirty inode page again.

Thanks,

> +
> retry_flush_quotas:
> f2fs_lock_all(sbi);
> if (__need_flush_quota(sbi)) {
> @@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi)
> if (get_pages(sbi, F2FS_DIRTY_NODES)) {
> up_write(&sbi->node_write);
> atomic_inc(&sbi->wb_sync_req[NODE]);
> - err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
> + err = f2fs_sync_node_pages(sbi, &wbc, false,
> + true, FS_CP_NODE_IO);
> atomic_dec(&sbi->wb_sync_req[NODE]);
> if (err) {
> up_write(&sbi->node_change);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index d916540f12813..ac6ae42b9dd4e 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
> struct writeback_control *wbc, bool atomic,
> unsigned int *seq_id);
> int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> - struct writeback_control *wbc,
> - bool do_balance, enum iostat_type io_type);
> + struct writeback_control *wbc,
> + bool do_balance, bool sync, enum iostat_type io_type);
> int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
> bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
> void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..fd00a8c119088 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page)
> }
>
> int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> - struct writeback_control *wbc,
> - bool do_balance, enum iostat_type io_type)
> + struct writeback_control *wbc,
> + bool do_balance, bool sync, enum iostat_type io_type)
> {
> pgoff_t index;
> struct pagevec pvec;
> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> goto continue_unlock;
> }
>
> - /* flush inline_data */
> - if (is_inline_node(page)) {
> + /* flush inline_data, if it's async context. */
> + if (!sync && is_inline_node(page)) {
> clear_inline_node(page);
> unlock_page(page);
> flush_inline_data(sbi, ino_of_node(page));
> @@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space *mapping,
>
> diff = nr_pages_to_write(sbi, NODE, wbc);
> blk_start_plug(&plug);
> - f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO);
> + f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO);
> blk_finish_plug(&plug);
> wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff);
>
>

2020-05-11 22:15:13

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 05/11, Chao Yu wrote:
> On 2020/5/10 3:03, Jaegeuk Kim wrote:
> > On 05/09, Chao Yu wrote:
> >> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> >>> Hi Sayali,
> >>>
> >>> In order to address the perf regression, how about this?
> >>>
> >>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> >>> From: Jaegeuk Kim <[email protected]>
> >>> Date: Fri, 8 May 2020 09:08:37 -0700
> >>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> >>>
> >>> There could be a scenario where f2fs_sync_node_pages gets
> >>> called during checkpoint, which in turn tries to flush
> >>> inline data and calls iput(). This results in deadlock as
> >>> iput() tries to hold cp_rwsem, which is already held at the
> >>> beginning by checkpoint->block_operations().
> >>>
> >>> Call stack :
> >>>
> >>> Thread A Thread B
> >>> f2fs_write_checkpoint()
> >>> - block_operations(sbi)
> >>> - f2fs_lock_all(sbi);
> >>> - down_write(&sbi->cp_rwsem);
> >>>
> >>> - open()
> >>> - igrab()
> >>> - write() write inline data
> >>> - unlink()
> >>> - f2fs_sync_node_pages()
> >>> - if (is_inline_node(page))
> >>> - flush_inline_data()
> >>> - ilookup()
> >>> page = f2fs_pagecache_get_page()
> >>> if (!page)
> >>> goto iput_out;
> >>> iput_out:
> >>> -close()
> >>> -iput()
> >>> iput(inode);
> >>> - f2fs_evict_inode()
> >>> - f2fs_truncate_blocks()
> >>> - f2fs_lock_op()
> >>> - down_read(&sbi->cp_rwsem);
> >>>
> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>> Signed-off-by: Sayali Lokhande <[email protected]>
> >>> Signed-off-by: Jaegeuk Kim <[email protected]>
> >>> ---
> >>> fs/f2fs/node.c | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> >>> index 1db8cabf727ef..626d7daca09de 100644
> >>> --- a/fs/f2fs/node.c
> >>> +++ b/fs/f2fs/node.c
> >>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >>> goto continue_unlock;
> >>> }
> >>>
> >>> - /* flush inline_data */
> >>> - if (is_inline_node(page)) {
> >>> + /* flush inline_data, if it's not sync path. */
> >>> + if (do_balance && is_inline_node(page)) {
> >>
> >> IIRC, this flow was designed to avoid running out of free space issue
> >> during checkpoint:
> >>
> >> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>
> >> The sceanrio is:
> >> 1. create fully node blocks
> >> 2. flush node blocks
> >> 3. write inline_data for all the node blocks again
> >> 4. flush node blocks redundantly
> >>
> >> I guess this may cause failing one case of fstest.
> >
> > Yeah, actually I was hitting 204 failure, and thus, revised like this.
> > Now, I don't see any regression in fstest.
> >
> >>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
> > From: Sayali Lokhande <[email protected]>
> > Date: Thu, 30 Apr 2020 16:28:29 +0530
> > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> >
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> >
> > Call stack :
> >
> > Thread A Thread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> > - f2fs_lock_all(sbi);
> > - down_write(&sbi->cp_rwsem);
> >
> > - open()
> > - igrab()
> > - write() write inline data
> > - unlink()
> > - f2fs_sync_node_pages()
> > - if (is_inline_node(page))
> > - flush_inline_data()
> > - ilookup()
> > page = f2fs_pagecache_get_page()
> > if (!page)
> > goto iput_out;
> > iput_out:
> > -close()
> > -iput()
> > iput(inode);
> > - f2fs_evict_inode()
> > - f2fs_truncate_blocks()
> > - f2fs_lock_op()
> > - down_read(&sbi->cp_rwsem);
> >
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> > Signed-off-by: Sayali Lokhande <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/checkpoint.c | 9 ++++++++-
> > fs/f2fs/f2fs.h | 4 ++--
> > fs/f2fs/node.c | 10 +++++-----
> > 3 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index d49f7a01d8a26..928aea4ff663d 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
> > };
> > int err = 0, cnt = 0;
> >
> > + /*
> > + * Let's flush node pages first to flush inline_data.
> > + * We'll actually guarantee everything below under f2fs_lock_all.
> > + */
> > + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
>
> It is possible that user write a large number of inline data in between
> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
> race condition.
>
> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
> flush inode page twice which is unneeded.
>
> f2fs_sync_node_pages() --- flush dirty inode page
> f2fs_lock_all()
> ...
> f2fs_sync_inode_meta() --- update dirty inode page
> f2fs_sync_node_pages() --- flush dirty inode page again.
>

Another version:

From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
From: Sayali Lokhande <[email protected]>
Date: Thu, 30 Apr 2020 16:28:29 +0530
Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint

There could be a scenario where f2fs_sync_node_pages gets
called during checkpoint, which in turn tries to flush
inline data and calls iput(). This results in deadlock as
iput() tries to hold cp_rwsem, which is already held at the
beginning by checkpoint->block_operations().

Call stack :

Thread A Thread B
f2fs_write_checkpoint()
- block_operations(sbi)
- f2fs_lock_all(sbi);
- down_write(&sbi->cp_rwsem);

- open()
- igrab()
- write() write inline data
- unlink()
- f2fs_sync_node_pages()
- if (is_inline_node(page))
- flush_inline_data()
- ilookup()
page = f2fs_pagecache_get_page()
if (!page)
goto iput_out;
iput_out:
-close()
-iput()
iput(inode);
- f2fs_evict_inode()
- f2fs_truncate_blocks()
- f2fs_lock_op()
- down_read(&sbi->cp_rwsem);

Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
Signed-off-by: Sayali Lokhande <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 5 +++++
fs/f2fs/f2fs.h | 1 +
fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d49f7a01d8a26..79e605f38f4fa 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
};
int err = 0, cnt = 0;

+ /*
+ * Let's flush inline_data in dirty node pages.
+ */
+ f2fs_flush_inline_data(sbi);
+
retry_flush_quotas:
f2fs_lock_all(sbi);
if (__need_flush_quota(sbi)) {
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2a8ea81c52a15..7f3d259e7e376 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
struct page *f2fs_get_node_page_ra(struct page *parent, int start);
int f2fs_move_node_page(struct page *node_page, int gc_type);
+int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
struct writeback_control *wbc, bool atomic,
unsigned int *seq_id);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 1db8cabf727ef..e632de10aedab 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
return true;
}

+int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
+{
+ pgoff_t index = 0;
+ struct pagevec pvec;
+ int nr_pages;
+ int ret = 0;
+
+ pagevec_init(&pvec);
+
+ while ((nr_pages = pagevec_lookup_tag(&pvec,
+ NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
+ int i;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+
+ if (!IS_DNODE(page))
+ continue;
+
+ lock_page(page);
+
+ if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
+continue_unlock:
+ unlock_page(page);
+ continue;
+ }
+
+ if (!PageDirty(page)) {
+ /* someone wrote it for us */
+ goto continue_unlock;
+ }
+
+ /* flush inline_data, if it's async context. */
+ if (is_inline_node(page)) {
+ clear_inline_node(page);
+ unlock_page(page);
+ flush_inline_data(sbi, ino_of_node(page));
+ continue;
+ }
+ unlock_page(page);
+ }
+ pagevec_release(&pvec);
+ cond_resched();
+ }
+ return ret;
+}
+
int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
struct writeback_control *wbc,
bool do_balance, enum iostat_type io_type)
@@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
goto continue_unlock;
}

- /* flush inline_data */
- if (is_inline_node(page)) {
+ /* flush inline_data, if it's async context. */
+ if (do_balance && is_inline_node(page)) {
clear_inline_node(page);
unlock_page(page);
flush_inline_data(sbi, ino_of_node(page));
--
2.26.2.645.ge9eca65c58-goog

2020-05-12 01:59:55

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 2020/5/12 6:11, Jaegeuk Kim wrote:
> On 05/11, Chao Yu wrote:
>> On 2020/5/10 3:03, Jaegeuk Kim wrote:
>>> On 05/09, Chao Yu wrote:
>>>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
>>>>> Hi Sayali,
>>>>>
>>>>> In order to address the perf regression, how about this?
>>>>>
>>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
>>>>> From: Jaegeuk Kim <[email protected]>
>>>>> Date: Fri, 8 May 2020 09:08:37 -0700
>>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>>>>>
>>>>> There could be a scenario where f2fs_sync_node_pages gets
>>>>> called during checkpoint, which in turn tries to flush
>>>>> inline data and calls iput(). This results in deadlock as
>>>>> iput() tries to hold cp_rwsem, which is already held at the
>>>>> beginning by checkpoint->block_operations().
>>>>>
>>>>> Call stack :
>>>>>
>>>>> Thread A Thread B
>>>>> f2fs_write_checkpoint()
>>>>> - block_operations(sbi)
>>>>> - f2fs_lock_all(sbi);
>>>>> - down_write(&sbi->cp_rwsem);
>>>>>
>>>>> - open()
>>>>> - igrab()
>>>>> - write() write inline data
>>>>> - unlink()
>>>>> - f2fs_sync_node_pages()
>>>>> - if (is_inline_node(page))
>>>>> - flush_inline_data()
>>>>> - ilookup()
>>>>> page = f2fs_pagecache_get_page()
>>>>> if (!page)
>>>>> goto iput_out;
>>>>> iput_out:
>>>>> -close()
>>>>> -iput()
>>>>> iput(inode);
>>>>> - f2fs_evict_inode()
>>>>> - f2fs_truncate_blocks()
>>>>> - f2fs_lock_op()
>>>>> - down_read(&sbi->cp_rwsem);
>>>>>
>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>> Signed-off-by: Sayali Lokhande <[email protected]>
>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>> ---
>>>>> fs/f2fs/node.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>> index 1db8cabf727ef..626d7daca09de 100644
>>>>> --- a/fs/f2fs/node.c
>>>>> +++ b/fs/f2fs/node.c
>>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>>> goto continue_unlock;
>>>>> }
>>>>>
>>>>> - /* flush inline_data */
>>>>> - if (is_inline_node(page)) {
>>>>> + /* flush inline_data, if it's not sync path. */
>>>>> + if (do_balance && is_inline_node(page)) {
>>>>
>>>> IIRC, this flow was designed to avoid running out of free space issue
>>>> during checkpoint:
>>>>
>>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>
>>>> The sceanrio is:
>>>> 1. create fully node blocks
>>>> 2. flush node blocks
>>>> 3. write inline_data for all the node blocks again
>>>> 4. flush node blocks redundantly
>>>>
>>>> I guess this may cause failing one case of fstest.
>>>
>>> Yeah, actually I was hitting 204 failure, and thus, revised like this.
>>> Now, I don't see any regression in fstest.
>>>
>>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
>>> From: Sayali Lokhande <[email protected]>
>>> Date: Thu, 30 Apr 2020 16:28:29 +0530
>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>>>
>>> There could be a scenario where f2fs_sync_node_pages gets
>>> called during checkpoint, which in turn tries to flush
>>> inline data and calls iput(). This results in deadlock as
>>> iput() tries to hold cp_rwsem, which is already held at the
>>> beginning by checkpoint->block_operations().
>>>
>>> Call stack :
>>>
>>> Thread A Thread B
>>> f2fs_write_checkpoint()
>>> - block_operations(sbi)
>>> - f2fs_lock_all(sbi);
>>> - down_write(&sbi->cp_rwsem);
>>>
>>> - open()
>>> - igrab()
>>> - write() write inline data
>>> - unlink()
>>> - f2fs_sync_node_pages()
>>> - if (is_inline_node(page))
>>> - flush_inline_data()
>>> - ilookup()
>>> page = f2fs_pagecache_get_page()
>>> if (!page)
>>> goto iput_out;
>>> iput_out:
>>> -close()
>>> -iput()
>>> iput(inode);
>>> - f2fs_evict_inode()
>>> - f2fs_truncate_blocks()
>>> - f2fs_lock_op()
>>> - down_read(&sbi->cp_rwsem);
>>>
>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>> Signed-off-by: Sayali Lokhande <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/checkpoint.c | 9 ++++++++-
>>> fs/f2fs/f2fs.h | 4 ++--
>>> fs/f2fs/node.c | 10 +++++-----
>>> 3 files changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index d49f7a01d8a26..928aea4ff663d 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
>>> };
>>> int err = 0, cnt = 0;
>>>
>>> + /*
>>> + * Let's flush node pages first to flush inline_data.
>>> + * We'll actually guarantee everything below under f2fs_lock_all.
>>> + */
>>> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
>>
>> It is possible that user write a large number of inline data in between
>> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
>> race condition.
>>
>> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
>> flush inode page twice which is unneeded.
>>
>> f2fs_sync_node_pages() --- flush dirty inode page
>> f2fs_lock_all()
>> ...
>> f2fs_sync_inode_meta() --- update dirty inode page
>> f2fs_sync_node_pages() --- flush dirty inode page again.
>>
>
> Another version:
>
>>From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
> From: Sayali Lokhande <[email protected]>
> Date: Thu, 30 Apr 2020 16:28:29 +0530
> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>
> There could be a scenario where f2fs_sync_node_pages gets
> called during checkpoint, which in turn tries to flush
> inline data and calls iput(). This results in deadlock as
> iput() tries to hold cp_rwsem, which is already held at the
> beginning by checkpoint->block_operations().
>
> Call stack :
>
> Thread A Thread B
> f2fs_write_checkpoint()
> - block_operations(sbi)
> - f2fs_lock_all(sbi);
> - down_write(&sbi->cp_rwsem);
>
> - open()
> - igrab()
> - write() write inline data
> - unlink()
> - f2fs_sync_node_pages()
> - if (is_inline_node(page))
> - flush_inline_data()
> - ilookup()
> page = f2fs_pagecache_get_page()
> if (!page)
> goto iput_out;
> iput_out:
> -close()
> -iput()
> iput(inode);
> - f2fs_evict_inode()
> - f2fs_truncate_blocks()
> - f2fs_lock_op()
> - down_read(&sbi->cp_rwsem);
>
> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> Signed-off-by: Sayali Lokhande <[email protected]>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 5 +++++
> fs/f2fs/f2fs.h | 1 +
> fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d49f7a01d8a26..79e605f38f4fa 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
> };
> int err = 0, cnt = 0;
>
> + /*
> + * Let's flush inline_data in dirty node pages.
> + */
> + f2fs_flush_inline_data(sbi);

Still there is a gap, user can write a large number of inline data here...

Would that be enough? I doubt we can suffer this issue in below pathes
as well:

- block_operations
- f2fs_sync_dirty_inodes
- iput
- f2fs_sync_inode_meta
- iput

Thanks,

> +
> retry_flush_quotas:
> f2fs_lock_all(sbi);
> if (__need_flush_quota(sbi)) {
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2a8ea81c52a15..7f3d259e7e376 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
> struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
> struct page *f2fs_get_node_page_ra(struct page *parent, int start);
> int f2fs_move_node_page(struct page *node_page, int gc_type);
> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
> int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
> struct writeback_control *wbc, bool atomic,
> unsigned int *seq_id);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 1db8cabf727ef..e632de10aedab 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
> return true;
> }
>
> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
> +{
> + pgoff_t index = 0;
> + struct pagevec pvec;
> + int nr_pages;
> + int ret = 0;
> +
> + pagevec_init(&pvec);
> +
> + while ((nr_pages = pagevec_lookup_tag(&pvec,
> + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
> + int i;
> +
> + for (i = 0; i < nr_pages; i++) {
> + struct page *page = pvec.pages[i];
> +
> + if (!IS_DNODE(page))
> + continue;
> +
> + lock_page(page);
> +
> + if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
> +continue_unlock:
> + unlock_page(page);
> + continue;
> + }
> +
> + if (!PageDirty(page)) {
> + /* someone wrote it for us */
> + goto continue_unlock;
> + }
> +
> + /* flush inline_data, if it's async context. */
> + if (is_inline_node(page)) {
> + clear_inline_node(page);
> + unlock_page(page);
> + flush_inline_data(sbi, ino_of_node(page));
> + continue;
> + }
> + unlock_page(page);
> + }
> + pagevec_release(&pvec);
> + cond_resched();
> + }
> + return ret;
> +}
> +
> int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> struct writeback_control *wbc,
> bool do_balance, enum iostat_type io_type)
> @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> goto continue_unlock;
> }
>
> - /* flush inline_data */
> - if (is_inline_node(page)) {
> + /* flush inline_data, if it's async context. */
> + if (do_balance && is_inline_node(page)) {
> clear_inline_node(page);
> unlock_page(page);
> flush_inline_data(sbi, ino_of_node(page));
>

2020-05-12 03:28:37

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 05/12, Chao Yu wrote:
> On 2020/5/12 6:11, Jaegeuk Kim wrote:
> > On 05/11, Chao Yu wrote:
> >> On 2020/5/10 3:03, Jaegeuk Kim wrote:
> >>> On 05/09, Chao Yu wrote:
> >>>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
> >>>>> Hi Sayali,
> >>>>>
> >>>>> In order to address the perf regression, how about this?
> >>>>>
> >>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
> >>>>> From: Jaegeuk Kim <[email protected]>
> >>>>> Date: Fri, 8 May 2020 09:08:37 -0700
> >>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
> >>>>>
> >>>>> There could be a scenario where f2fs_sync_node_pages gets
> >>>>> called during checkpoint, which in turn tries to flush
> >>>>> inline data and calls iput(). This results in deadlock as
> >>>>> iput() tries to hold cp_rwsem, which is already held at the
> >>>>> beginning by checkpoint->block_operations().
> >>>>>
> >>>>> Call stack :
> >>>>>
> >>>>> Thread A Thread B
> >>>>> f2fs_write_checkpoint()
> >>>>> - block_operations(sbi)
> >>>>> - f2fs_lock_all(sbi);
> >>>>> - down_write(&sbi->cp_rwsem);
> >>>>>
> >>>>> - open()
> >>>>> - igrab()
> >>>>> - write() write inline data
> >>>>> - unlink()
> >>>>> - f2fs_sync_node_pages()
> >>>>> - if (is_inline_node(page))
> >>>>> - flush_inline_data()
> >>>>> - ilookup()
> >>>>> page = f2fs_pagecache_get_page()
> >>>>> if (!page)
> >>>>> goto iput_out;
> >>>>> iput_out:
> >>>>> -close()
> >>>>> -iput()
> >>>>> iput(inode);
> >>>>> - f2fs_evict_inode()
> >>>>> - f2fs_truncate_blocks()
> >>>>> - f2fs_lock_op()
> >>>>> - down_read(&sbi->cp_rwsem);
> >>>>>
> >>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>>>> Signed-off-by: Sayali Lokhande <[email protected]>
> >>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
> >>>>> ---
> >>>>> fs/f2fs/node.c | 4 ++--
> >>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> >>>>> index 1db8cabf727ef..626d7daca09de 100644
> >>>>> --- a/fs/f2fs/node.c
> >>>>> +++ b/fs/f2fs/node.c
> >>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> >>>>> goto continue_unlock;
> >>>>> }
> >>>>>
> >>>>> - /* flush inline_data */
> >>>>> - if (is_inline_node(page)) {
> >>>>> + /* flush inline_data, if it's not sync path. */
> >>>>> + if (do_balance && is_inline_node(page)) {
> >>>>
> >>>> IIRC, this flow was designed to avoid running out of free space issue
> >>>> during checkpoint:
> >>>>
> >>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>>>
> >>>> The sceanrio is:
> >>>> 1. create fully node blocks
> >>>> 2. flush node blocks
> >>>> 3. write inline_data for all the node blocks again
> >>>> 4. flush node blocks redundantly
> >>>>
> >>>> I guess this may cause failing one case of fstest.
> >>>
> >>> Yeah, actually I was hitting 204 failure, and thus, revised like this.
> >>> Now, I don't see any regression in fstest.
> >>>
> >>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
> >>> From: Sayali Lokhande <[email protected]>
> >>> Date: Thu, 30 Apr 2020 16:28:29 +0530
> >>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> >>>
> >>> There could be a scenario where f2fs_sync_node_pages gets
> >>> called during checkpoint, which in turn tries to flush
> >>> inline data and calls iput(). This results in deadlock as
> >>> iput() tries to hold cp_rwsem, which is already held at the
> >>> beginning by checkpoint->block_operations().
> >>>
> >>> Call stack :
> >>>
> >>> Thread A Thread B
> >>> f2fs_write_checkpoint()
> >>> - block_operations(sbi)
> >>> - f2fs_lock_all(sbi);
> >>> - down_write(&sbi->cp_rwsem);
> >>>
> >>> - open()
> >>> - igrab()
> >>> - write() write inline data
> >>> - unlink()
> >>> - f2fs_sync_node_pages()
> >>> - if (is_inline_node(page))
> >>> - flush_inline_data()
> >>> - ilookup()
> >>> page = f2fs_pagecache_get_page()
> >>> if (!page)
> >>> goto iput_out;
> >>> iput_out:
> >>> -close()
> >>> -iput()
> >>> iput(inode);
> >>> - f2fs_evict_inode()
> >>> - f2fs_truncate_blocks()
> >>> - f2fs_lock_op()
> >>> - down_read(&sbi->cp_rwsem);
> >>>
> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> >>> Signed-off-by: Sayali Lokhande <[email protected]>
> >>> Signed-off-by: Jaegeuk Kim <[email protected]>
> >>> ---
> >>> fs/f2fs/checkpoint.c | 9 ++++++++-
> >>> fs/f2fs/f2fs.h | 4 ++--
> >>> fs/f2fs/node.c | 10 +++++-----
> >>> 3 files changed, 15 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >>> index d49f7a01d8a26..928aea4ff663d 100644
> >>> --- a/fs/f2fs/checkpoint.c
> >>> +++ b/fs/f2fs/checkpoint.c
> >>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
> >>> };
> >>> int err = 0, cnt = 0;
> >>>
> >>> + /*
> >>> + * Let's flush node pages first to flush inline_data.
> >>> + * We'll actually guarantee everything below under f2fs_lock_all.
> >>> + */
> >>> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
> >>
> >> It is possible that user write a large number of inline data in between
> >> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
> >> race condition.
> >>
> >> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
> >> flush inode page twice which is unneeded.
> >>
> >> f2fs_sync_node_pages() --- flush dirty inode page
> >> f2fs_lock_all()
> >> ...
> >> f2fs_sync_inode_meta() --- update dirty inode page
> >> f2fs_sync_node_pages() --- flush dirty inode page again.
> >>
> >
> > Another version:
> >
> >>From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
> > From: Sayali Lokhande <[email protected]>
> > Date: Thu, 30 Apr 2020 16:28:29 +0530
> > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
> >
> > There could be a scenario where f2fs_sync_node_pages gets
> > called during checkpoint, which in turn tries to flush
> > inline data and calls iput(). This results in deadlock as
> > iput() tries to hold cp_rwsem, which is already held at the
> > beginning by checkpoint->block_operations().
> >
> > Call stack :
> >
> > Thread A Thread B
> > f2fs_write_checkpoint()
> > - block_operations(sbi)
> > - f2fs_lock_all(sbi);
> > - down_write(&sbi->cp_rwsem);
> >
> > - open()
> > - igrab()
> > - write() write inline data
> > - unlink()
> > - f2fs_sync_node_pages()
> > - if (is_inline_node(page))
> > - flush_inline_data()
> > - ilookup()
> > page = f2fs_pagecache_get_page()
> > if (!page)
> > goto iput_out;
> > iput_out:
> > -close()
> > -iput()
> > iput(inode);
> > - f2fs_evict_inode()
> > - f2fs_truncate_blocks()
> > - f2fs_lock_op()
> > - down_read(&sbi->cp_rwsem);
> >
> > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
> > Signed-off-by: Sayali Lokhande <[email protected]>
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/checkpoint.c | 5 +++++
> > fs/f2fs/f2fs.h | 1 +
> > fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++--
> > 3 files changed, 55 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index d49f7a01d8a26..79e605f38f4fa 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
> > };
> > int err = 0, cnt = 0;
> >
> > + /*
> > + * Let's flush inline_data in dirty node pages.
> > + */
> > + f2fs_flush_inline_data(sbi);
>
> Still there is a gap, user can write a large number of inline data here...

I think generic/204 is the case, and I don't hit a panic with this patch.

>
> Would that be enough? I doubt we can suffer this issue in below pathes
> as well:

I don't think so, since iput is called after f2fs_unlock_all().

>
> - block_operations
> - f2fs_sync_dirty_inodes
> - iput
> - f2fs_sync_inode_meta
> - iput
>
> Thanks,
>
> > +
> > retry_flush_quotas:
> > f2fs_lock_all(sbi);
> > if (__need_flush_quota(sbi)) {
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 2a8ea81c52a15..7f3d259e7e376 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
> > struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
> > struct page *f2fs_get_node_page_ra(struct page *parent, int start);
> > int f2fs_move_node_page(struct page *node_page, int gc_type);
> > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
> > int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
> > struct writeback_control *wbc, bool atomic,
> > unsigned int *seq_id);
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index 1db8cabf727ef..e632de10aedab 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
> > return true;
> > }
> >
> > +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
> > +{
> > + pgoff_t index = 0;
> > + struct pagevec pvec;
> > + int nr_pages;
> > + int ret = 0;
> > +
> > + pagevec_init(&pvec);
> > +
> > + while ((nr_pages = pagevec_lookup_tag(&pvec,
> > + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
> > + int i;
> > +
> > + for (i = 0; i < nr_pages; i++) {
> > + struct page *page = pvec.pages[i];
> > +
> > + if (!IS_DNODE(page))
> > + continue;
> > +
> > + lock_page(page);
> > +
> > + if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
> > +continue_unlock:
> > + unlock_page(page);
> > + continue;
> > + }
> > +
> > + if (!PageDirty(page)) {
> > + /* someone wrote it for us */
> > + goto continue_unlock;
> > + }
> > +
> > + /* flush inline_data, if it's async context. */
> > + if (is_inline_node(page)) {
> > + clear_inline_node(page);
> > + unlock_page(page);
> > + flush_inline_data(sbi, ino_of_node(page));
> > + continue;
> > + }
> > + unlock_page(page);
> > + }
> > + pagevec_release(&pvec);
> > + cond_resched();
> > + }
> > + return ret;
> > +}
> > +
> > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> > struct writeback_control *wbc,
> > bool do_balance, enum iostat_type io_type)
> > @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
> > goto continue_unlock;
> > }
> >
> > - /* flush inline_data */
> > - if (is_inline_node(page)) {
> > + /* flush inline_data, if it's async context. */
> > + if (do_balance && is_inline_node(page)) {
> > clear_inline_node(page);
> > unlock_page(page);
> > flush_inline_data(sbi, ino_of_node(page));
> >

2020-05-12 06:54:01

by Chao Yu

[permalink] [raw]
Subject: Re: [PATCH V4] f2fs: Avoid double lock for cp_rwsem during checkpoint

On 2020/5/12 11:24, Jaegeuk Kim wrote:
> On 05/12, Chao Yu wrote:
>> On 2020/5/12 6:11, Jaegeuk Kim wrote:
>>> On 05/11, Chao Yu wrote:
>>>> On 2020/5/10 3:03, Jaegeuk Kim wrote:
>>>>> On 05/09, Chao Yu wrote:
>>>>>> On 2020/5/9 0:10, Jaegeuk Kim wrote:
>>>>>>> Hi Sayali,
>>>>>>>
>>>>>>> In order to address the perf regression, how about this?
>>>>>>>
>>>>>>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001
>>>>>>> From: Jaegeuk Kim <[email protected]>
>>>>>>> Date: Fri, 8 May 2020 09:08:37 -0700
>>>>>>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint
>>>>>>>
>>>>>>> There could be a scenario where f2fs_sync_node_pages gets
>>>>>>> called during checkpoint, which in turn tries to flush
>>>>>>> inline data and calls iput(). This results in deadlock as
>>>>>>> iput() tries to hold cp_rwsem, which is already held at the
>>>>>>> beginning by checkpoint->block_operations().
>>>>>>>
>>>>>>> Call stack :
>>>>>>>
>>>>>>> Thread A Thread B
>>>>>>> f2fs_write_checkpoint()
>>>>>>> - block_operations(sbi)
>>>>>>> - f2fs_lock_all(sbi);
>>>>>>> - down_write(&sbi->cp_rwsem);
>>>>>>>
>>>>>>> - open()
>>>>>>> - igrab()
>>>>>>> - write() write inline data
>>>>>>> - unlink()
>>>>>>> - f2fs_sync_node_pages()
>>>>>>> - if (is_inline_node(page))
>>>>>>> - flush_inline_data()
>>>>>>> - ilookup()
>>>>>>> page = f2fs_pagecache_get_page()
>>>>>>> if (!page)
>>>>>>> goto iput_out;
>>>>>>> iput_out:
>>>>>>> -close()
>>>>>>> -iput()
>>>>>>> iput(inode);
>>>>>>> - f2fs_evict_inode()
>>>>>>> - f2fs_truncate_blocks()
>>>>>>> - f2fs_lock_op()
>>>>>>> - down_read(&sbi->cp_rwsem);
>>>>>>>
>>>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>>>> Signed-off-by: Sayali Lokhande <[email protected]>
>>>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>>>> ---
>>>>>>> fs/f2fs/node.c | 4 ++--
>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>>>>> index 1db8cabf727ef..626d7daca09de 100644
>>>>>>> --- a/fs/f2fs/node.c
>>>>>>> +++ b/fs/f2fs/node.c
>>>>>>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>>>>>> goto continue_unlock;
>>>>>>> }
>>>>>>>
>>>>>>> - /* flush inline_data */
>>>>>>> - if (is_inline_node(page)) {
>>>>>>> + /* flush inline_data, if it's not sync path. */
>>>>>>> + if (do_balance && is_inline_node(page)) {
>>>>>>
>>>>>> IIRC, this flow was designed to avoid running out of free space issue
>>>>>> during checkpoint:
>>>>>>
>>>>>> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>>>
>>>>>> The sceanrio is:
>>>>>> 1. create fully node blocks
>>>>>> 2. flush node blocks
>>>>>> 3. write inline_data for all the node blocks again
>>>>>> 4. flush node blocks redundantly
>>>>>>
>>>>>> I guess this may cause failing one case of fstest.
>>>>>
>>>>> Yeah, actually I was hitting 204 failure, and thus, revised like this.
>>>>> Now, I don't see any regression in fstest.
>>>>>
>>>>> >From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001
>>>>> From: Sayali Lokhande <[email protected]>
>>>>> Date: Thu, 30 Apr 2020 16:28:29 +0530
>>>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>>>>>
>>>>> There could be a scenario where f2fs_sync_node_pages gets
>>>>> called during checkpoint, which in turn tries to flush
>>>>> inline data and calls iput(). This results in deadlock as
>>>>> iput() tries to hold cp_rwsem, which is already held at the
>>>>> beginning by checkpoint->block_operations().
>>>>>
>>>>> Call stack :
>>>>>
>>>>> Thread A Thread B
>>>>> f2fs_write_checkpoint()
>>>>> - block_operations(sbi)
>>>>> - f2fs_lock_all(sbi);
>>>>> - down_write(&sbi->cp_rwsem);
>>>>>
>>>>> - open()
>>>>> - igrab()
>>>>> - write() write inline data
>>>>> - unlink()
>>>>> - f2fs_sync_node_pages()
>>>>> - if (is_inline_node(page))
>>>>> - flush_inline_data()
>>>>> - ilookup()
>>>>> page = f2fs_pagecache_get_page()
>>>>> if (!page)
>>>>> goto iput_out;
>>>>> iput_out:
>>>>> -close()
>>>>> -iput()
>>>>> iput(inode);
>>>>> - f2fs_evict_inode()
>>>>> - f2fs_truncate_blocks()
>>>>> - f2fs_lock_op()
>>>>> - down_read(&sbi->cp_rwsem);
>>>>>
>>>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>>>> Signed-off-by: Sayali Lokhande <[email protected]>
>>>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>>>> ---
>>>>> fs/f2fs/checkpoint.c | 9 ++++++++-
>>>>> fs/f2fs/f2fs.h | 4 ++--
>>>>> fs/f2fs/node.c | 10 +++++-----
>>>>> 3 files changed, 15 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>>> index d49f7a01d8a26..928aea4ff663d 100644
>>>>> --- a/fs/f2fs/checkpoint.c
>>>>> +++ b/fs/f2fs/checkpoint.c
>>>>> @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi)
>>>>> };
>>>>> int err = 0, cnt = 0;
>>>>>
>>>>> + /*
>>>>> + * Let's flush node pages first to flush inline_data.
>>>>> + * We'll actually guarantee everything below under f2fs_lock_all.
>>>>> + */
>>>>> + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
>>>>
>>>> It is possible that user write a large number of inline data in between
>>>> f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in
>>>> race condition.
>>>>
>>>> Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will
>>>> flush inode page twice which is unneeded.
>>>>
>>>> f2fs_sync_node_pages() --- flush dirty inode page
>>>> f2fs_lock_all()
>>>> ...
>>>> f2fs_sync_inode_meta() --- update dirty inode page
>>>> f2fs_sync_node_pages() --- flush dirty inode page again.
>>>>
>>>
>>> Another version:
>>>
>>> >From 6b430b72af57c65c20dea7b87f7ba7e9df36be98 Mon Sep 17 00:00:00 2001
>>> From: Sayali Lokhande <[email protected]>
>>> Date: Thu, 30 Apr 2020 16:28:29 +0530
>>> Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint
>>>
>>> There could be a scenario where f2fs_sync_node_pages gets
>>> called during checkpoint, which in turn tries to flush
>>> inline data and calls iput(). This results in deadlock as
>>> iput() tries to hold cp_rwsem, which is already held at the
>>> beginning by checkpoint->block_operations().
>>>
>>> Call stack :
>>>
>>> Thread A Thread B
>>> f2fs_write_checkpoint()
>>> - block_operations(sbi)
>>> - f2fs_lock_all(sbi);
>>> - down_write(&sbi->cp_rwsem);
>>>
>>> - open()
>>> - igrab()
>>> - write() write inline data
>>> - unlink()
>>> - f2fs_sync_node_pages()
>>> - if (is_inline_node(page))
>>> - flush_inline_data()
>>> - ilookup()
>>> page = f2fs_pagecache_get_page()
>>> if (!page)
>>> goto iput_out;
>>> iput_out:
>>> -close()
>>> -iput()
>>> iput(inode);
>>> - f2fs_evict_inode()
>>> - f2fs_truncate_blocks()
>>> - f2fs_lock_op()
>>> - down_read(&sbi->cp_rwsem);
>>>
>>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data")
>>> Signed-off-by: Sayali Lokhande <[email protected]>
>>> Signed-off-by: Jaegeuk Kim <[email protected]>
>>> ---
>>> fs/f2fs/checkpoint.c | 5 +++++
>>> fs/f2fs/f2fs.h | 1 +
>>> fs/f2fs/node.c | 51 ++++++++++++++++++++++++++++++++++++++++++--
>>> 3 files changed, 55 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>> index d49f7a01d8a26..79e605f38f4fa 100644
>>> --- a/fs/f2fs/checkpoint.c
>>> +++ b/fs/f2fs/checkpoint.c
>>> @@ -1168,6 +1168,11 @@ static int block_operations(struct f2fs_sb_info *sbi)
>>> };
>>> int err = 0, cnt = 0;
>>>
>>> + /*
>>> + * Let's flush inline_data in dirty node pages.
>>> + */
>>> + f2fs_flush_inline_data(sbi);
>>
>> Still there is a gap, user can write a large number of inline data here...
>
> I think generic/204 is the case, and I don't hit a panic with this patch.

Yes, may be triggered by a variant of generic/204 which is not exist yet, and
it's a rare corner case.

>
>>
>> Would that be enough? I doubt we can suffer this issue in below pathes
>> as well:
>
> I don't think so, since iput is called after f2fs_unlock_all().

That's correct.

Thanks,

>
>>
>> - block_operations
>> - f2fs_sync_dirty_inodes
>> - iput
>> - f2fs_sync_inode_meta
>> - iput
>>
>> Thanks,
>>
>>> +
>>> retry_flush_quotas:
>>> f2fs_lock_all(sbi);
>>> if (__need_flush_quota(sbi)) {
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 2a8ea81c52a15..7f3d259e7e376 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -3282,6 +3282,7 @@ void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
>>> struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
>>> struct page *f2fs_get_node_page_ra(struct page *parent, int start);
>>> int f2fs_move_node_page(struct page *node_page, int gc_type);
>>> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi);
>>> int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
>>> struct writeback_control *wbc, bool atomic,
>>> unsigned int *seq_id);
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>> index 1db8cabf727ef..e632de10aedab 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -1807,6 +1807,53 @@ static bool flush_dirty_inode(struct page *page)
>>> return true;
>>> }
>>>
>>> +int f2fs_flush_inline_data(struct f2fs_sb_info *sbi)
>>> +{
>>> + pgoff_t index = 0;
>>> + struct pagevec pvec;
>>> + int nr_pages;
>>> + int ret = 0;
>>> +
>>> + pagevec_init(&pvec);
>>> +
>>> + while ((nr_pages = pagevec_lookup_tag(&pvec,
>>> + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
>>> + int i;
>>> +
>>> + for (i = 0; i < nr_pages; i++) {
>>> + struct page *page = pvec.pages[i];
>>> +
>>> + if (!IS_DNODE(page))
>>> + continue;
>>> +
>>> + lock_page(page);
>>> +
>>> + if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
>>> +continue_unlock:
>>> + unlock_page(page);
>>> + continue;
>>> + }
>>> +
>>> + if (!PageDirty(page)) {
>>> + /* someone wrote it for us */
>>> + goto continue_unlock;
>>> + }
>>> +
>>> + /* flush inline_data, if it's async context. */
>>> + if (is_inline_node(page)) {
>>> + clear_inline_node(page);
>>> + unlock_page(page);
>>> + flush_inline_data(sbi, ino_of_node(page));
>>> + continue;
>>> + }
>>> + unlock_page(page);
>>> + }
>>> + pagevec_release(&pvec);
>>> + cond_resched();
>>> + }
>>> + return ret;
>>> +}
>>> +
>>> int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>> struct writeback_control *wbc,
>>> bool do_balance, enum iostat_type io_type)
>>> @@ -1870,8 +1917,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
>>> goto continue_unlock;
>>> }
>>>
>>> - /* flush inline_data */
>>> - if (is_inline_node(page)) {
>>> + /* flush inline_data, if it's async context. */
>>> + if (do_balance && is_inline_node(page)) {
>>> clear_inline_node(page);
>>> unlock_page(page);
>>> flush_inline_data(sbi, ino_of_node(page));
>>>
> .
>