Let's guarantee flusing dirty meta pages to avoid infinite loop.
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 620a386d82c1a..9a7f695d5adb3 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1266,6 +1266,9 @@ void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
if (unlikely(f2fs_cp_error(sbi)))
break;
+ if (type == F2FS_DIRTY_META)
+ f2fs_sync_meta_pages(sbi, META, LONG_MAX,
+ FS_CP_META_IO);
io_schedule_timeout(DEFAULT_IO_TIMEOUT);
}
finish_wait(&sbi->cp_wait, &wait);
@@ -1493,8 +1496,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
sbi->last_valid_block_count = sbi->total_valid_block_count;
percpu_counter_set(&sbi->alloc_valid_block_count, 0);
- /* Here, we have one bio having CP pack except cp pack 2 page */
- f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
/* Wait for all dirty meta pages to be submitted for IO */
f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META);
--
2.26.2.761.g0e0b3e54be-goog
On 2020/5/15 10:15, Jaegeuk Kim wrote:
> Let's guarantee flusing dirty meta pages to avoid infinite loop.
What's the root cause? Race case or meta page flush failure?
Thanks,
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 620a386d82c1a..9a7f695d5adb3 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1266,6 +1266,9 @@ void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
> if (unlikely(f2fs_cp_error(sbi)))
> break;
>
> + if (type == F2FS_DIRTY_META)
> + f2fs_sync_meta_pages(sbi, META, LONG_MAX,
> + FS_CP_META_IO);
> io_schedule_timeout(DEFAULT_IO_TIMEOUT);
> }
> finish_wait(&sbi->cp_wait, &wait);
> @@ -1493,8 +1496,6 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
> sbi->last_valid_block_count = sbi->total_valid_block_count;
> percpu_counter_set(&sbi->alloc_valid_block_count, 0);
>
> - /* Here, we have one bio having CP pack except cp pack 2 page */
> - f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
> /* Wait for all dirty meta pages to be submitted for IO */
> f2fs_wait_on_all_pages(sbi, F2FS_DIRTY_META);
>
>
On 05/15, Chao Yu wrote:
> On 2020/5/15 10:15, Jaegeuk Kim wrote:
> > Let's guarantee flusing dirty meta pages to avoid infinite loop.
>
> What's the root cause? Race case or meta page flush failure?
Investigating, but at least, this can avoid the inifinite loop there.
V2:
From c60ce8e7178004fd6cba96e592247e43b5fd98d8 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <[email protected]>
Date: Wed, 13 May 2020 21:12:53 -0700
Subject: [PATCH] f2fs: flush dirty meta pages when flushing them
Let's guarantee flusing dirty meta pages to avoid infinite loop.
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/checkpoint.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 620a386d82c1a..3dc3ac6fe1432 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1266,6 +1266,9 @@ void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
if (unlikely(f2fs_cp_error(sbi)))
break;
+ if (type == F2FS_DIRTY_META)
+ f2fs_sync_meta_pages(sbi, META, LONG_MAX,
+ FS_CP_META_IO);
io_schedule_timeout(DEFAULT_IO_TIMEOUT);
}
finish_wait(&sbi->cp_wait, &wait);
--
2.26.2.761.g0e0b3e54be-goog
On 2020/5/15 22:45, Jaegeuk Kim wrote:
> On 05/15, Chao Yu wrote:
>> On 2020/5/15 10:15, Jaegeuk Kim wrote:
>>> Let's guarantee flusing dirty meta pages to avoid infinite loop.
>>
>> What's the root cause? Race case or meta page flush failure?
>
> Investigating, but at least, this can avoid the inifinite loop there.
Hmm... this fix may cover the root cause..
Thanks,
>
> V2:
>
>>From c60ce8e7178004fd6cba96e592247e43b5fd98d8 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim <[email protected]>
> Date: Wed, 13 May 2020 21:12:53 -0700
> Subject: [PATCH] f2fs: flush dirty meta pages when flushing them
>
> Let's guarantee flusing dirty meta pages to avoid infinite loop.
>
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/checkpoint.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 620a386d82c1a..3dc3ac6fe1432 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1266,6 +1266,9 @@ void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
> if (unlikely(f2fs_cp_error(sbi)))
> break;
>
> + if (type == F2FS_DIRTY_META)
> + f2fs_sync_meta_pages(sbi, META, LONG_MAX,
> + FS_CP_META_IO);
> io_schedule_timeout(DEFAULT_IO_TIMEOUT);
> }
> finish_wait(&sbi->cp_wait, &wait);
>
On 05/18, Chao Yu wrote:
> On 2020/5/15 22:45, Jaegeuk Kim wrote:
> > On 05/15, Chao Yu wrote:
> >> On 2020/5/15 10:15, Jaegeuk Kim wrote:
> >>> Let's guarantee flusing dirty meta pages to avoid infinite loop.
> >>
> >> What's the root cause? Race case or meta page flush failure?
> >
> > Investigating, but at least, this can avoid the inifinite loop there.
>
> Hmm... this fix may cover the root cause..
We're getting reached out to one related to this issue where single SSA
page is dirtied at the moment. Anyway, I think it'd be fine to get this
as we can detect any fs consistency issue by fsck. So far, I haven't seen
any problem in all my local stress tests.
>
> Thanks,
>
> >
> > V2:
> >
> >>From c60ce8e7178004fd6cba96e592247e43b5fd98d8 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim <[email protected]>
> > Date: Wed, 13 May 2020 21:12:53 -0700
> > Subject: [PATCH] f2fs: flush dirty meta pages when flushing them
> >
> > Let's guarantee flusing dirty meta pages to avoid infinite loop.
> >
> > Signed-off-by: Jaegeuk Kim <[email protected]>
> > ---
> > fs/f2fs/checkpoint.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 620a386d82c1a..3dc3ac6fe1432 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -1266,6 +1266,9 @@ void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
> > if (unlikely(f2fs_cp_error(sbi)))
> > break;
> >
> > + if (type == F2FS_DIRTY_META)
> > + f2fs_sync_meta_pages(sbi, META, LONG_MAX,
> > + FS_CP_META_IO);
> > io_schedule_timeout(DEFAULT_IO_TIMEOUT);
> > }
> > finish_wait(&sbi->cp_wait, &wait);
> >
Greeting,
FYI, we noticed the following commit (built with gcc-7):
commit: b78356484793c82e07fe6f7ca3b62b1f18651267 ("[PATCH] f2fs: flush dirty meta pages when flushing them")
url: https://github.com/0day-ci/linux/commits/Jaegeuk-Kim/f2fs-flush-dirty-meta-pages-when-flushing-them/20200515-101937
base: https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test
in testcase: xfstests
with following parameters:
disk: 4HDD
fs: f2fs
test: generic-group13
test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 20.432705] WARNING: CPU: 0 PID: 819 at kernel/sched/core.c:6763 __might_sleep+0x71/0x80
[ 20.433992] Modules linked in: dm_mod f2fs sr_mod cdrom intel_rapl_msr sg bochs_drm drm_vram_helper drm_ttm_helper ttm ppdev intel_rapl_common drm_kms_helper crct10dif_pclmul ata_generic pata_acpi crc32_pclmul crc32c_intel snd_pcm ghash_clmulni_intel snd_timer syscopyarea sysfillrect snd sysimgblt fb_sys_fops aesni_intel crypto_simd soundcore cryptd glue_helper ata_piix joydev drm pcspkr libata serio_raw i2c_piix4 parport_pc parport floppy ip_tables
[ 20.438562] CPU: 0 PID: 819 Comm: umount Not tainted 5.6.0-11895-gb78356484793c #1
[ 20.439391] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 20.440584] RIP: 0010:__might_sleep+0x71/0x80
[ 20.440981] Code: 5c 41 5d 5d e9 90 fe ff ff 48 8b 90 08 22 00 00 48 8b 70 10 48 c7 c7 f8 84 b3 85 c6 05 f9 0e 74 01 01 48 89 d1 e8 4f fd fc ff <0f> 0b eb c7 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 8b 05
[ 20.443456] RSP: 0018:ffff98170098ba08 EFLAGS: 00010282
[ 20.444212] RAX: 0000000000000000 RBX: ffffffffc05d35b8 RCX: 0000000000000000
[ 20.445253] RDX: 0000000000000001 RSI: ffff89343fc19b88 RDI: ffff89343fc19b88
[ 20.446408] RBP: ffff98170098ba20 R08: 0000000000000000 R09: 0000000000aaaaaa
[ 20.447608] R10: 0000000000000040 R11: ffff89330c48f360 R12: 00000000000001df
[ 20.448834] R13: 0000000000000000 R14: ffff8933bad80178 R15: ffffc9ec06e7c000
[ 20.449985] FS: 00007f63193b7e40(0000) GS:ffff89343fc00000(0000) knlGS:0000000000000000
[ 20.451319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 20.452099] CR2: 000055ace9a90350 CR3: 00000001bc424000 CR4: 00000000000406f0
[ 20.452801] Call Trace:
[ 20.453125] f2fs_sync_meta_pages+0x1a1/0x2c0 [f2fs]
[ 20.453593] ? __mod_lruvec_state+0x3f/0x100
[ 20.454182] ? xas_load+0x8/0x80
[ 20.454554] ? __xa_set_mark+0x5b/0x80
[ 20.454910] ? f2fs_wait_on_all_pages+0xc0/0xf0 [f2fs]
[ 20.455382] f2fs_wait_on_all_pages+0xc0/0xf0 [f2fs]
[ 20.455963] ? finish_wait+0x80/0x80
[ 20.456524] do_checkpoint+0x5ac/0xe10 [f2fs]
[ 20.457178] ? _cond_resched+0x19/0x30
[ 20.457739] ? down_write+0x21/0x50
[ 20.458273] ? __submit_merged_write_cond+0x15e/0x190 [f2fs]
[ 20.459117] f2fs_write_checkpoint+0x70e/0x780 [f2fs]
[ 20.459878] f2fs_sync_fs+0x9f/0x130 [f2fs]
[ 20.461543] sync_filesystem+0x71/0x90
[ 20.463113] generic_shutdown_super+0x22/0x120
[ 20.464775] kill_block_super+0x21/0x50
[ 20.466362] kill_f2fs_super+0x79/0xd0 [f2fs]
[ 20.467985] ? unregister_shrinker+0x69/0x80
[ 20.469591] deactivate_locked_super+0x3f/0x70
[ 20.471187] cleanup_mnt+0xb8/0x150
[ 20.472670] task_work_run+0x83/0xc0
[ 20.474149] exit_to_usermode_loop+0xeb/0xf0
[ 20.475697] do_syscall_64+0x1c8/0x1f0
[ 20.479190] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 20.480880] RIP: 0033:0x7f6318c9bd77
[ 20.482341] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f1 00 2b 00 f7 d8 64 89 01 48
[ 20.487088] RSP: 002b:00007ffeb69f34b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 20.489270] RAX: 0000000000000000 RBX: 000055e86e241060 RCX: 00007f6318c9bd77
[ 20.491359] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055e86e241240
[ 20.493444] RBP: 000055e86e241240 R08: 000055e86e2425e0 R09: 0000000000000014
[ 20.495482] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f631919de64
[ 20.497496] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeb69f3740
[ 20.499472] ---[ end trace fedf2c5128b96899 ]---
To reproduce:
# build kernel
cd linux
cp config-5.6.0-11895-gb78356484793c .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-7 CC=gcc-7 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
Thanks,
Rong Chen