LinuxLists.cc - [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock

2020-05-29 09:32:20

Subject: [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

Under heavy fsstress, we may triggle panic while issuing discard,
because __check_sit_bitmap() detects that discard command may earse
valid data blocks, the root cause is as below race stack described,
since we removed lock when flushing quota data, quota data writeback
may race with write_checkpoint(), so that it causes inconsistency in
between cached discard entry and segment bitmap.

- f2fs_write_checkpoint
- block_operations
- set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
- f2fs_flush_sit_entries
- add_discard_addrs
- __set_bit_le(i, (void *)de->discard_map);
- f2fs_write_data_pages
- f2fs_write_single_data_page
: inode is quota one, cp_rwsem won't be locked
- f2fs_do_write_data_page
- f2fs_allocate_data_block
- f2fs_wait_discard_bio
: discard entry has not been added yet.
- update_sit_entry
- f2fs_clear_prefree_segments
- f2fs_issue_discard
: add discard entry

This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
failure due to f2fs_lock_op").

Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
Signed-off-by: Chao Yu <[email protected]>
---
fs/f2fs/compress.c | 8 +++-----
fs/f2fs/data.c | 4 ++--
2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index a53578a89211..62c4857ae46d 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -1097,7 +1097,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
loff_t psize;
int i, err;

- if (!IS_NOQUOTA(inode) && !f2fs_trylock_op(sbi))
+ if (!f2fs_trylock_op(sbi))
return -EAGAIN;

set_new_dnode(&dn, cc->inode, NULL, NULL, 0);
@@ -1204,8 +1204,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN);

f2fs_put_dnode(&dn);
- if (!IS_NOQUOTA(inode))
- f2fs_unlock_op(sbi);
+ f2fs_unlock_op(sbi);

spin_lock(&fi->i_size_lock);
if (fi->last_disk_size < psize)
@@ -1231,8 +1230,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
out_put_dnode:
f2fs_put_dnode(&dn);
out_unlock_op:
- if (!IS_NOQUOTA(inode))
- f2fs_unlock_op(sbi);
+ f2fs_unlock_op(sbi);
return -EAGAIN;
}

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 999a36ff05f1..d983ad54f318 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2724,8 +2724,8 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
f2fs_available_free_memory(sbi, BASE_CHECK))))
goto redirty_out;

- /* Dentry/quota blocks are controlled by checkpoint */
- if (S_ISDIR(inode->i_mode) || IS_NOQUOTA(inode)) {
+ /* Dentry blocks are controlled by checkpoint */
+ if (S_ISDIR(inode->i_mode)) {
fio.need_lock = LOCK_DONE;
err = f2fs_do_write_data_page(&fio);
goto done;
--
2.18.0.rc1

2020-05-29 22:36:49

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

On 05/29, Chao Yu wrote:
> Under heavy fsstress, we may triggle panic while issuing discard,
> because __check_sit_bitmap() detects that discard command may earse
> valid data blocks, the root cause is as below race stack described,
> since we removed lock when flushing quota data, quota data writeback
> may race with write_checkpoint(), so that it causes inconsistency in
> between cached discard entry and segment bitmap.
>
> - f2fs_write_checkpoint
> - block_operations
> - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
> - f2fs_flush_sit_entries
> - add_discard_addrs
> - __set_bit_le(i, (void *)de->discard_map);
> - f2fs_write_data_pages
> - f2fs_write_single_data_page
> : inode is quota one, cp_rwsem won't be locked
> - f2fs_do_write_data_page
> - f2fs_allocate_data_block
> - f2fs_wait_discard_bio
> : discard entry has not been added yet.
> - update_sit_entry
> - f2fs_clear_prefree_segments
> - f2fs_issue_discard
> : add discard entry
>
> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
> failure due to f2fs_lock_op").
>
> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")

The previous patch fixes quota_sync gets EAGAIN all the time.
How about this? It seems this works for fsstress test.

---
fs/f2fs/segment.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index ebbadde6cbced..f67cffc38975e 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3095,6 +3095,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
struct curseg_info *curseg = CURSEG_I(sbi, type);
bool put_pin_sem = false;

+ /*
+ * We need to wait for node_write to avoid block allocation during
+ * checkpoint. This can only happen to quota writes which can cause
+ * the below discard race condition.
+ */
+ if (IS_DATASEG(type))
+ down_write(&sbi->node_write);
+
if (type == CURSEG_COLD_DATA) {
/* GC during CURSEG_COLD_DATA_PINNED allocation */
if (down_read_trylock(&sbi->pin_sem)) {
@@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,

if (put_pin_sem)
up_read(&sbi->pin_sem);
+
+ if (IS_DATASEG(type))
+ up_write(&sbi->node_write);
}

static void update_device_state(struct f2fs_io_info *fio)
--
2.27.0.rc0.183.gde8f92d652-goog

2020-05-30 01:32:26

by Chao Yu

[permalink] [raw]

Subject: Re: [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

On 2020/5/30 6:34, Jaegeuk Kim wrote:
> On 05/29, Chao Yu wrote:
>> Under heavy fsstress, we may triggle panic while issuing discard,
>> because __check_sit_bitmap() detects that discard command may earse
>> valid data blocks, the root cause is as below race stack described,
>> since we removed lock when flushing quota data, quota data writeback
>> may race with write_checkpoint(), so that it causes inconsistency in
>> between cached discard entry and segment bitmap.
>>
>> - f2fs_write_checkpoint
>> - block_operations
>> - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
>> - f2fs_flush_sit_entries
>> - add_discard_addrs
>> - __set_bit_le(i, (void *)de->discard_map);
>> - f2fs_write_data_pages
>> - f2fs_write_single_data_page
>> : inode is quota one, cp_rwsem won't be locked
>> - f2fs_do_write_data_page
>> - f2fs_allocate_data_block
>> - f2fs_wait_discard_bio
>> : discard entry has not been added yet.
>> - update_sit_entry
>> - f2fs_clear_prefree_segments
>> - f2fs_issue_discard
>> : add discard entry
>>
>> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
>> failure due to f2fs_lock_op").
>>
>> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
>
> The previous patch fixes quota_sync gets EAGAIN all the time.
> How about this? It seems this works for fsstress test.
>
> ---
> fs/f2fs/segment.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index ebbadde6cbced..f67cffc38975e 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -3095,6 +3095,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
> struct curseg_info *curseg = CURSEG_I(sbi, type);
> bool put_pin_sem = false;
>
> + /*
> + * We need to wait for node_write to avoid block allocation during
> + * checkpoint. This can only happen to quota writes which can cause
> + * the below discard race condition.
> + */
> + if (IS_DATASEG(type))

type is CURSEG_COLD_DATA_PINNED, IS_DATASEG(CURSEG_COLD_DATA_PINNED) should be false,
then node_write lock will not be held, later type will be updated to CURSEG_COLD_DATA,
then we will try to release node_write.

Thanks,

> + down_write(&sbi->node_write);
> +
> if (type == CURSEG_COLD_DATA) {
> /* GC during CURSEG_COLD_DATA_PINNED allocation */
> if (down_read_trylock(&sbi->pin_sem)) {
> @@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
>
> if (put_pin_sem)
> up_read(&sbi->pin_sem);
> +
> + if (IS_DATASEG(type))
> + up_write(&sbi->node_write);
> }
>
> static void update_device_state(struct f2fs_io_info *fio)
>

2020-05-30 01:57:35

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

On 05/30, Chao Yu wrote:
> On 2020/5/30 6:34, Jaegeuk Kim wrote:
> > On 05/29, Chao Yu wrote:
> >> Under heavy fsstress, we may triggle panic while issuing discard,
> >> because __check_sit_bitmap() detects that discard command may earse
> >> valid data blocks, the root cause is as below race stack described,
> >> since we removed lock when flushing quota data, quota data writeback
> >> may race with write_checkpoint(), so that it causes inconsistency in
> >> between cached discard entry and segment bitmap.
> >>
> >> - f2fs_write_checkpoint
> >> - block_operations
> >> - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
> >> - f2fs_flush_sit_entries
> >> - add_discard_addrs
> >> - __set_bit_le(i, (void *)de->discard_map);
> >> - f2fs_write_data_pages
> >> - f2fs_write_single_data_page
> >> : inode is quota one, cp_rwsem won't be locked
> >> - f2fs_do_write_data_page
> >> - f2fs_allocate_data_block
> >> - f2fs_wait_discard_bio
> >> : discard entry has not been added yet.
> >> - update_sit_entry
> >> - f2fs_clear_prefree_segments
> >> - f2fs_issue_discard
> >> : add discard entry
> >>
> >> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
> >> failure due to f2fs_lock_op").
> >>
> >> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
> >
> > The previous patch fixes quota_sync gets EAGAIN all the time.
> > How about this? It seems this works for fsstress test.
> >

Then this?

---
fs/f2fs/segment.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index ebbadde6cbced..ed11dcf2d69ed 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3107,6 +3107,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
type = CURSEG_COLD_DATA;
}

+ /*
+ * We need to wait for node_write to avoid block allocation during
+ * checkpoint. This can only happen to quota writes which can cause
+ * the below discard race condition.
+ */
+ if (IS_DATASEG(type))
+ down_write(&sbi->node_write);
+
down_read(&SM_I(sbi)->curseg_lock);

mutex_lock(&curseg->curseg_mutex);
@@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,

if (put_pin_sem)
up_read(&sbi->pin_sem);
+
+ if (IS_DATASEG(type))
+ up_write(&sbi->node_write);
}

static void update_device_state(struct f2fs_io_info *fio)
--
2.27.0.rc0.183.gde8f92d652-goog

2020-05-30 02:05:09

by Chao Yu

[permalink] [raw]

Subject: Re: [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

On 2020/5/30 9:49, Jaegeuk Kim wrote:
> On 05/30, Chao Yu wrote:
>> On 2020/5/30 6:34, Jaegeuk Kim wrote:
>>> On 05/29, Chao Yu wrote:
>>>> Under heavy fsstress, we may triggle panic while issuing discard,
>>>> because __check_sit_bitmap() detects that discard command may earse
>>>> valid data blocks, the root cause is as below race stack described,
>>>> since we removed lock when flushing quota data, quota data writeback
>>>> may race with write_checkpoint(), so that it causes inconsistency in
>>>> between cached discard entry and segment bitmap.
>>>>
>>>> - f2fs_write_checkpoint
>>>> - block_operations
>>>> - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
>>>> - f2fs_flush_sit_entries
>>>> - add_discard_addrs
>>>> - __set_bit_le(i, (void *)de->discard_map);
>>>> - f2fs_write_data_pages
>>>> - f2fs_write_single_data_page
>>>> : inode is quota one, cp_rwsem won't be locked
>>>> - f2fs_do_write_data_page
>>>> - f2fs_allocate_data_block
>>>> - f2fs_wait_discard_bio
>>>> : discard entry has not been added yet.
>>>> - update_sit_entry
>>>> - f2fs_clear_prefree_segments
>>>> - f2fs_issue_discard
>>>> : add discard entry
>>>>
>>>> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
>>>> failure due to f2fs_lock_op").
>>>>
>>>> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
>>>
>>> The previous patch fixes quota_sync gets EAGAIN all the time.
>>> How about this? It seems this works for fsstress test.
>>>
>
> Then this?
>
> ---
> fs/f2fs/segment.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index ebbadde6cbced..ed11dcf2d69ed 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -3107,6 +3107,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
> type = CURSEG_COLD_DATA;
> }
>
> + /*
> + * We need to wait for node_write to avoid block allocation during
> + * checkpoint. This can only happen to quota writes which can cause
> + * the below discard race condition.
> + */
> + if (IS_DATASEG(type))
> + down_write(&sbi->node_write);
> +
> down_read(&SM_I(sbi)->curseg_lock);
>
> mutex_lock(&curseg->curseg_mutex);
> @@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,

Minor thing: unlock order.

if (IS_DATASEG(type))
up_write(&sbi->node_write);

Could you merge the diff into this patch?

>
> if (put_pin_sem)
> up_read(&sbi->pin_sem);
> +
> + if (IS_DATASEG(type))
> + up_write(&sbi->node_write);
> }
>
> static void update_device_state(struct f2fs_io_info *fio)
>

2020-05-30 15:22:54

by Jaegeuk Kim

[permalink] [raw]

Subject: Re: [PATCH] Revert "f2fs: fix quota_sync failure due to f2fs_lock_op"

On 05/30, Chao Yu wrote:
> On 2020/5/30 9:49, Jaegeuk Kim wrote:
> > On 05/30, Chao Yu wrote:
> >> On 2020/5/30 6:34, Jaegeuk Kim wrote:
> >>> On 05/29, Chao Yu wrote:
> >>>> Under heavy fsstress, we may triggle panic while issuing discard,
> >>>> because __check_sit_bitmap() detects that discard command may earse
> >>>> valid data blocks, the root cause is as below race stack described,
> >>>> since we removed lock when flushing quota data, quota data writeback
> >>>> may race with write_checkpoint(), so that it causes inconsistency in
> >>>> between cached discard entry and segment bitmap.
> >>>>
> >>>> - f2fs_write_checkpoint
> >>>> - block_operations
> >>>> - set_sbi_flag(sbi, SBI_QUOTA_SKIP_FLUSH)
> >>>> - f2fs_flush_sit_entries
> >>>> - add_discard_addrs
> >>>> - __set_bit_le(i, (void *)de->discard_map);
> >>>> - f2fs_write_data_pages
> >>>> - f2fs_write_single_data_page
> >>>> : inode is quota one, cp_rwsem won't be locked
> >>>> - f2fs_do_write_data_page
> >>>> - f2fs_allocate_data_block
> >>>> - f2fs_wait_discard_bio
> >>>> : discard entry has not been added yet.
> >>>> - update_sit_entry
> >>>> - f2fs_clear_prefree_segments
> >>>> - f2fs_issue_discard
> >>>> : add discard entry
> >>>>
> >>>> This patch fixes this issue by reverting 435cbab95e39 ("f2fs: fix quota_sync
> >>>> failure due to f2fs_lock_op").
> >>>>
> >>>> Fixes: 435cbab95e39 ("f2fs: fix quota_sync failure due to f2fs_lock_op")
> >>>
> >>> The previous patch fixes quota_sync gets EAGAIN all the time.
> >>> How about this? It seems this works for fsstress test.
> >>>
> >
> > Then this?
> >
> > ---
> > fs/f2fs/segment.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index ebbadde6cbced..ed11dcf2d69ed 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -3107,6 +3107,14 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
> > type = CURSEG_COLD_DATA;
> > }
> >
> > + /*
> > + * We need to wait for node_write to avoid block allocation during
> > + * checkpoint. This can only happen to quota writes which can cause
> > + * the below discard race condition.
> > + */
> > + if (IS_DATASEG(type))
> > + down_write(&sbi->node_write);
> > +
> > down_read(&SM_I(sbi)->curseg_lock);
> >
> > mutex_lock(&curseg->curseg_mutex);
> > @@ -3174,6 +3182,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
>
> Minor thing: unlock order.
>
> if (IS_DATASEG(type))
> up_write(&sbi->node_write);
>
> Could you merge the diff into this patch?

Pushed in dev branch. Thanks.

>
> >
> > if (put_pin_sem)
> > up_read(&sbi->pin_sem);
> > +
> > + if (IS_DATASEG(type))
> > + up_write(&sbi->node_write);
> > }
> >
> > static void update_device_state(struct f2fs_io_info *fio)
> >