2013-08-05 12:02:45

by Jin Xu

[permalink] [raw]
Subject: [PATCH] f2fs: fix a deadlock in fsync

From: Jin Xu <[email protected]>

This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
...
- f2fs_write_data_pages
...
- update_extent_cache
...
- update_inode
- wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
- f2fs_write_data_pages
- mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock and
the sbi->writepages lock, causing the bdi thread being blocked when attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a inode page
that is being written back.

Signed-off-by: Jin Xu <[email protected]>
---
fs/f2fs/f2fs.h | 2 ++
fs/f2fs/gc.c | 5 +----
fs/f2fs/inode.c | 3 ++-
fs/f2fs/segment.c | 9 +++++++++
4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 467d42d..064d3f9 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1013,6 +1013,8 @@ void allocate_new_segments(struct f2fs_sb_info *);
struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
struct bio *f2fs_bio_alloc(struct block_device *, int);
void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
+void f2fs_wait_on_page_writeback(struct f2fs_sb_info *sbi,
+ struct page *page, enum page_type type, bool sync);
void write_meta_page(struct f2fs_sb_info *, struct page *);
void write_node_page(struct f2fs_sb_info *, struct page *, unsigned int,
block_t, block_t *);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 35f9b1a..acfa411 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -508,10 +508,7 @@ static void move_data_page(struct inode *inode, struct page *page, int gc_type)
} else {
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);

- if (PageWriteback(page)) {
- f2fs_submit_bio(sbi, DATA, true);
- wait_on_page_writeback(page);
- }
+ f2fs_wait_on_page_writeback(sbi, page, DATA, true);

if (clear_page_dirty_for_io(page) &&
S_ISDIR(inode->i_mode)) {
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 2b2d45d1..d42b85b 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -148,10 +148,11 @@ bad_inode:

void update_inode(struct inode *inode, struct page *node_page)
{
+ struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
struct f2fs_node *rn;
struct f2fs_inode *ri;

- wait_on_page_writeback(node_page);
+ f2fs_wait_on_page_writeback(sbi, node_page, NODE, false);

rn = page_address(node_page);
ri = &(rn->i);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index a86d125..7056cc0 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -702,6 +702,15 @@ alloc_new:
trace_f2fs_submit_write_page(page, blk_addr, type);
}

+void f2fs_wait_on_page_writeback(struct f2fs_sb_info *sbi,
+ struct page *page, enum page_type type, bool sync)
+{
+ if (PageWriteback(page)) {
+ f2fs_submit_bio(sbi, type, sync);
+ wait_on_page_writeback(page);
+ }
+}
+
static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
--
1.7.9.5


2013-08-06 12:46:50

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: fix a deadlock in fsync

Hi, Jin,

IMO, this patch tries to fix the deadlock condition on
f2fs_write_data_pages.

I think the errorneous scenario is something like this.
When there remains only one fs_lock during the checkpoint procedure,
f2fs_write_data_pages successfully gets the last one at the moment.
Then, other operations like sync and writeback thread are definitely
blocked too.

Meanwhile, in the flow of f2fs_write_data_pages, it is able to wait on
writebacked node page, which is what you decribed.

If you indicated this scenario correctly, as I examined the flow again,
I found one more case, __set_data_blkaddr, in addition to the
update_inode. And, I can clean up another minor flow too.

Please check the below patch.
Thanks,

---->

>From a9c62162ea89c9b6b52d39d6db3f8f27c4d2ce5c Mon Sep 17 00:00:00 2001
From: Jin Xu <[email protected]>
Date: Mon, 5 Aug 2013 20:02:04 +0800
Subject: [PATCH] f2fs: fix a deadlock in fsync
Cc: [email protected], [email protected],
[email protected]

This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
...
- f2fs_write_data_pages
...
- update_extent_cache
...
- update_inode
- wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
- f2fs_write_data_pages
- mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that
has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock
and
the sbi->writepages lock, causing the bdi thread being blocked when
attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a
inode page
that is being written back.

Signed-off-by: Jin Xu <[email protected]>
[Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
Signed-off-by: Jaegeuk Kim <[email protected]>
---
fs/f2fs/data.c | 2 +-
fs/f2fs/f2fs.h | 3 ++-
fs/f2fs/gc.c | 8 ++------
fs/f2fs/inode.c | 2 +-
fs/f2fs/segment.c | 10 ++++++++++
5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index f458883..a7eb529 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -37,7 +37,7 @@ static void __set_data_blkaddr(struct dnode_of_data
*dn, block_t new_addr)
struct page *node_page = dn->node_page;
unsigned int ofs_in_node = dn->ofs_in_node;

- wait_on_page_writeback(node_page);
+ f2fs_wait_on_page_writeback(node_page, NODE, false);

rn = F2FS_NODE(node_page);

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 63813be..13db10b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1023,7 +1023,8 @@ int npages_for_summary_flush(struct f2fs_sb_info
*);
void allocate_new_segments(struct f2fs_sb_info *);
struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
struct bio *f2fs_bio_alloc(struct block_device *, int);
-void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
+void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool);
+void f2fs_wait_on_page_writeback(struct page *, enum page_type, bool);
void write_meta_page(struct f2fs_sb_info *, struct page *);
void write_node_page(struct f2fs_sb_info *, struct page *, unsigned
int,
block_t, block_t *);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d286d8b..e6b3ffd 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -422,8 +422,7 @@ next_step:

/* set page dirty and write it */
if (gc_type == FG_GC) {
- f2fs_submit_bio(sbi, NODE, true);
- wait_on_page_writeback(node_page);
+ f2fs_wait_on_page_writeback(node_page, NODE, true);
set_page_dirty(node_page);
} else {
if (!PageWriteback(node_page))
@@ -523,10 +522,7 @@ static void move_data_page(struct inode *inode,
struct page *page, int gc_type)
} else {
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);

- if (PageWriteback(page)) {
- f2fs_submit_bio(sbi, DATA, true);
- wait_on_page_writeback(page);
- }
+ f2fs_wait_on_page_writeback(page, DATA, true);

if (clear_page_dirty_for_io(page) &&
S_ISDIR(inode->i_mode)) {
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index debf743..9ab81e7 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -151,7 +151,7 @@ void update_inode(struct inode *inode, struct page
*node_page)
struct f2fs_node *rn;
struct f2fs_inode *ri;

- wait_on_page_writeback(node_page);
+ f2fs_wait_on_page_writeback(node_page, NODE, false);

rn = F2FS_NODE(node_page);
ri = &(rn->i);
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9b74ae2..68e344f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -705,6 +705,16 @@ retry:
trace_f2fs_submit_write_page(page, blk_addr, type);
}

+void f2fs_wait_on_page_writeback(struct page *page,
+ enum page_type type, bool sync)
+{
+ struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
+ if (PageWriteback(page)) {
+ f2fs_submit_bio(sbi, type, sync);
+ wait_on_page_writeback(page);
+ }
+}
+
static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
--
1.8.3.1.437.g0dbd812



--
Jaegeuk Kim
Samsung

2013-08-07 04:23:53

by Jin Xu

[permalink] [raw]
Subject: Re: [PATCH] f2fs: fix a deadlock in fsync

Hi Kim,

The patch sounds good to me.

BTW, as in the deadlock situation I noticed, I did not find any thread was doing checkpoint at that moment. The bdi writeback thread and the fsync user thread was not. Neither was the background f2fs_gc thread because there had enough free segments. Meanwhile, another user thread was blocked in the f2fs_sync_file->sync_node_pages, which, I think, is not a factor contributing to the deadlock.

Thanks,
Jin

On 2013-8-6, at 20:46, Jaegeuk Kim <[email protected]> wrote:

> Hi, Jin,
>
> IMO, this patch tries to fix the deadlock condition on
> f2fs_write_data_pages.
>
> I think the errorneous scenario is something like this.
> When there remains only one fs_lock during the checkpoint procedure,
> f2fs_write_data_pages successfully gets the last one at the moment.
> Then, other operations like sync and writeback thread are definitely
> blocked too.
>
> Meanwhile, in the flow of f2fs_write_data_pages, it is able to wait on
> writebacked node page, which is what you decribed.
>
> If you indicated this scenario correctly, as I examined the flow again,
> I found one more case, __set_data_blkaddr, in addition to the
> update_inode. And, I can clean up another minor flow too.
>
> Please check the below patch.
> Thanks,
>
> ---->
>
> From a9c62162ea89c9b6b52d39d6db3f8f27c4d2ce5c Mon Sep 17 00:00:00 2001
> From: Jin Xu <[email protected]>
> Date: Mon, 5 Aug 2013 20:02:04 +0800
> Subject: [PATCH] f2fs: fix a deadlock in fsync
> Cc: [email protected], [email protected],
> [email protected]
>
> This patch fixes a deadlock bug that occurs quite often when there are
> concurrent write and fsync on a same file.
>
> Following is the simplified call trace when tasks get hung.
>
> fsync thread:
> - f2fs_sync_file
> ...
> - f2fs_write_data_pages
> ...
> - update_extent_cache
> ...
> - update_inode
> - wait_on_page_writeback
>
> bdi writeback thread
> - __writeback_single_inode
> - f2fs_write_data_pages
> - mutex_lock(sbi->writepages)
>
> The deadlock happens when the fsync thread waits on a inode page that
> has
> been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
> no one else could be able to submit the cached bio to block layer for
> writeback. This is because the fsync thread already hold a sbi->fs_lock
> and
> the sbi->writepages lock, causing the bdi thread being blocked when
> attempt
> to write data pages for the same inode. At the same time, f2fs_gc thread
> does not notice the situation and could not help. Even the sync syscall
> gets blocked.
>
> To fix it, we could submit the cached bio first before waiting on a
> inode page
> that is being written back.
>
> Signed-off-by: Jin Xu <[email protected]>
> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
> Signed-off-by: Jaegeuk Kim <[email protected]>
> ---
> fs/f2fs/data.c | 2 +-
> fs/f2fs/f2fs.h | 3 ++-
> fs/f2fs/gc.c | 8 ++------
> fs/f2fs/inode.c | 2 +-
> fs/f2fs/segment.c | 10 ++++++++++
> 5 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index f458883..a7eb529 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -37,7 +37,7 @@ static void __set_data_blkaddr(struct dnode_of_data
> *dn, block_t new_addr)
> struct page *node_page = dn->node_page;
> unsigned int ofs_in_node = dn->ofs_in_node;
>
> - wait_on_page_writeback(node_page);
> + f2fs_wait_on_page_writeback(node_page, NODE, false);
>
> rn = F2FS_NODE(node_page);
>
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 63813be..13db10b 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1023,7 +1023,8 @@ int npages_for_summary_flush(struct f2fs_sb_info
> *);
> void allocate_new_segments(struct f2fs_sb_info *);
> struct page *get_sum_page(struct f2fs_sb_info *, unsigned int);
> struct bio *f2fs_bio_alloc(struct block_device *, int);
> -void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool sync);
> +void f2fs_submit_bio(struct f2fs_sb_info *, enum page_type, bool);
> +void f2fs_wait_on_page_writeback(struct page *, enum page_type, bool);
> void write_meta_page(struct f2fs_sb_info *, struct page *);
> void write_node_page(struct f2fs_sb_info *, struct page *, unsigned
> int,
> block_t, block_t *);
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index d286d8b..e6b3ffd 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -422,8 +422,7 @@ next_step:
>
> /* set page dirty and write it */
> if (gc_type == FG_GC) {
> - f2fs_submit_bio(sbi, NODE, true);
> - wait_on_page_writeback(node_page);
> + f2fs_wait_on_page_writeback(node_page, NODE, true);
> set_page_dirty(node_page);
> } else {
> if (!PageWriteback(node_page))
> @@ -523,10 +522,7 @@ static void move_data_page(struct inode *inode,
> struct page *page, int gc_type)
> } else {
> struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
>
> - if (PageWriteback(page)) {
> - f2fs_submit_bio(sbi, DATA, true);
> - wait_on_page_writeback(page);
> - }
> + f2fs_wait_on_page_writeback(page, DATA, true);
>
> if (clear_page_dirty_for_io(page) &&
> S_ISDIR(inode->i_mode)) {
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index debf743..9ab81e7 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -151,7 +151,7 @@ void update_inode(struct inode *inode, struct page
> *node_page)
> struct f2fs_node *rn;
> struct f2fs_inode *ri;
>
> - wait_on_page_writeback(node_page);
> + f2fs_wait_on_page_writeback(node_page, NODE, false);
>
> rn = F2FS_NODE(node_page);
> ri = &(rn->i);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9b74ae2..68e344f 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -705,6 +705,16 @@ retry:
> trace_f2fs_submit_write_page(page, blk_addr, type);
> }
>
> +void f2fs_wait_on_page_writeback(struct page *page,
> + enum page_type type, bool sync)
> +{
> + struct f2fs_sb_info *sbi = F2FS_SB(page->mapping->host->i_sb);
> + if (PageWriteback(page)) {
> + f2fs_submit_bio(sbi, type, sync);
> + wait_on_page_writeback(page);
> + }
> +}
> +
> static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type)
> {
> struct curseg_info *curseg = CURSEG_I(sbi, type);
> --
> 1.8.3.1.437.g0dbd812
>
>
>
> --
> Jaegeuk Kim
> Samsung
>

2013-08-07 09:43:44

by Jaegeuk Kim

[permalink] [raw]
Subject: Re: [PATCH] f2fs: fix a deadlock in fsync

Hi Jin,

2013-08-07 (수), 12:23 +0800, Jim Xu:
> Hi Kim,
>
> The patch sounds good to me.
>
> BTW, as in the deadlock situation I noticed, I did not find any thread was doing checkpoint at that moment. The bdi writeback thread and the fsync user thread was not. Neither was the background f2fs_gc thread because there had enough free segments. Meanwhile, another user thread was blocked in the f2fs_sync_file->sync_node_pages, which, I think, is not a factor contributing to the deadlock.

I indicated one possible scenario.
As you described, many fsync calls can also consume all the fs_lock too.
Thanks,

--
Jaegeuk Kim
Samsung